One fun, if slightly silly, thing we can do with word frequencies is to “normalize” sentences by replacing low frequency words with more common synonyms, basically removing the unique style of the author and replacing it with the most bland amalgam of modern writing styles possible.
I can’t imagine a serious reason a person might consider doing this, but it was an amusing way to spend a few hours.
For this exercise I relied on Wordnet, a software library that can be used for, among other things, looking up synonyms for words. Of course, just because two words are synonyms in general doesn’t mean that they are interchangeable in a given specific sentence. But by mixing my human instinct for sensible synonyms with the computer’s knowledge of word frequencies I was able to produce the following interesting “translations” from the text of Frankenstein:
Original: “No mortal could support the horror of that countenance”
Processed: “No person could support the fear of that face”
Original: “How can I describe my emotions at this catastrophe, or how delineate the wretch whom with such infinite pains and care I had endeavoured to form?”
Processed: “How can I account of my feelings at this disaster, or how represent the wretch whom with such infinite trial and work I had tried to make?”
Original: “My rage is unspeakable, when I reflect that the murderer, whom I have turned loose upon society, still exists.”
Processed: “My anger is terrible, when I think that the killer, whom I have turned loose upon society, still lives”
Original: “Soon these burning miseries will be extinct. I shall ascend my funeral pile triumphantly, and exult in the agony of the torturing flames”
Processed: “Soon this hurtful misery will be extinct. I will rise onto my funeral column triumphantly, and joy in the pain of the torturing fire”
One interesting observation here is that words are difficult to work with in isolation. The only way to decide if a word replacement works or not is by looking at the words around it. While I did this by hand I suspect the process could be at least partially automated by loading a list of common word pairs and triplets. If the word pair created by a possible synonym is rare or non-existent then clearly that synonym won’t work in that sentence.
Google actually published that information alongside their plain word frequencies, so perhaps some day in the future I’ll give a go at writing a synonym system that can automatically filter out obviously wrong replacements.