I imagine most people here are familiar with “Doge”, a long running Internet joke focused on cute dogs having silly conversations in broken English. But not just any sort of broken English; Doge has actual trends and rules. You can find out much more in depth by searching the Internet for articles like this one but a high level summary is:
The Rules of Doge
– All sentences are only two words long.
– The first word is always “such”, “very”, “much”, “many”, or “so”
– The second word can be any English noun, verb, adjective or adverb
For example: Near the end of Call of the Wild the book’s entire theme is neatly summarized in a single evocative sentence:
Deep in the forest a call was sounding, and as often as he heard this call, mysteriously thrilling and luring, he felt compelled to turn his back upon the fire and the beaten earth around it, and to plunge into the forest, and on and on, he knew not where or why; nor did he wonder where or why, the call sounding imperiously, deep in the forest.
To convey the same idea in Doge we might try:
Such call.
Very mystery.
So compel.
All things considered “Doge” is a very human language game, a demonstration of our ability to fit existing words into strange new patterns with surprisingly few difficulties.
But can we automate the process?
Probably not entirely, or at least not within the single weekend of free time I have available. But it should still be amusing to see how far we can get.
The simplest approach is probably to use spaCy to identify and parse out all the nouns, verbs, adjectives and adverbs from a sentence. Then we can attach a random dodge modifier to each word, throw in a final dodge sentence ender and see what we get:
Original :
The rabbit sped down the river
Processed:
so rabbit
very sped
much river
amaze
Not bad for a first try. Obvious problem with “very sped” though; dodge tries to avoid conjugation. Fortunately spaCy lets you extract the lemma from a verb:
Original:
The rabbit sped down the river
Processed:
very rabbit
many speed
so river
excite
Sadly this simple approach starts to fall apart the more complex the sentence gets:
Original:
The Outside dogs, whose digestions had not been trained by chronic famine to make the most of little, had voracious appetites
Processed:
so outside
such dogs
such digestions
very train
many chronic
much famine
so make
many most
very little
so voracious
such appetites
wow
For this sort of multi-part sentence we would ideally want to just focus on a few core ideas and boil the entire phrase down to something like “Such dogs. Much appetite. Amaze.”
We can work towards this goal by using spaCy to extract not just a list of words but an entire sentence diagram highlighting the main noun, verb and object. We can use this to doge just the core idea of a sentence:
Original:
The Outside dogs, whose digestions had not been trained by chronic famine to make the most of little, had voracious appetites
Processed:
so dog
very have
much appetite
amaze
Not bad. Let’s try it out on another:
Original:
A rest comes very good after one has travelled three thousand miles, and it must be confessed that Buck waxed lazy as his wounds healed, his muscles swelled out, and the flesh came back to cover his bones.
Processed:
so rest
so come
amaze
This time a bit too much got cut out and the remaining core idea is an English idiom that makes no sense in isolation. This is a trend that repeats itself though much of the more interesting sentences in the book. Jack London just has a love for sentences where the grammatical core is some unimportant observation and the real meat of the idea is separated off by a conjunction or hidden in an aside or propositional phrase. This is perfectly lovely writing, but it’s more than a bit confusing for simple sentence parsing.
With enough time I’m sure a spaCy powered solution to extract these “hidden” sub-sentences and covert them into passable Doge could be found. But alas, time grows short..