Recent Posts

The Mayer of Casterbridge: Basic Statistics

The Mayor of Casterbridge, by Thomas Hardy, starts when a hard working but short tempered man drinks a bit too much, starts complaining about his marriage and then wakes up the next morning with a horrible hangover and the shocking…

Continue Reading The Mayer of Casterbridge: Basic Statistics

The Beautiful and Damned: Sentence Normalization

It’s been a while since I’ve done a “sentence modernization” post, so let’s take some excerpts from The Beautiful and Damned and see what happens when we replace every word with more common equivalents. It will be interesting to see…

Continue Reading The Beautiful and Damned: Sentence Normalization

The Beautiful and Damned: Unique Sentences

The Beautiful and the Damned is a snapshot of the roaring twenties, and so it seems more than appropriate that the “most unique” sentence in the book is a description of that era’s extravagant night life: There were opera cloaks…

Continue Reading The Beautiful and Damned: Unique Sentences

The Beautiful and Damned: Word Frequency Analysis

The Beautiful and Damned is only about a hundred years old, which experience has shown is recent enough to feel more or less like modern English but with different slang. This is reflected in the fact that no words in…

Continue Reading The Beautiful and Damned: Word Frequency Analysis

The Beautiful and Damned: Basic Statistics

In celebration of The Great Gatsby finally entering the public domain we’ll be looking at one of F. Scott Fitzgerald’s earlier and lesser known books: The Beautiful and Damned. It tells the story of a young man and woman who…

Continue Reading The Beautiful and Damned: Basic Statistics

Cuchulain, the Hound of Ulster: Missing Words

We’ve gotten pretty good at finding words that an author uses unusually often. But what about the opposite? What about words that the author uses less than we expect, or even not at all? After all, Cuchulain, the Hound of…

Continue Reading Cuchulain, the Hound of Ulster: Missing Words

Cuchulain, the Hound of Ulster: Unique Sentences

The unique sentence results for Cuchulain wound up being an interesting lesson in the importance of proper pre-processing when doing natural language processing. While the relative frequency results are interesting enough the global frequency results were basically useless. See for…

Continue Reading Cuchulain, the Hound of Ulster: Unique Sentences

Cuchulain, the Hound of Ulster: Word Frequency Analysis

Time to dig deeper into the writing style of Eleanor Henrietta Hull in her book Cuchulain, the Hound of Ulster. The first and most unusual note is that there were zero words that fit the criteria of my “common in…

Continue Reading Cuchulain, the Hound of Ulster: Word Frequency Analysis

Cuchulain, the Hound of Ulster: Basic Statistics

Ireland has a rich set of myths and legends that I know almost nothing about other than the fact that there was a hero named Cú Chulainn that keeps showing up in Japanese RPGs*. Fortunately back in the early 1900s…

Continue Reading Cuchulain, the Hound of Ulster: Basic Statistics

Moby Dick: Algorithmic Abridgement

Moby Dick is a very very long book, partially because the author enjoys going off on tangents and partially because he insists on providing thorough background details on the biology of whales and the nature of the whaling industry. And…

Continue Reading Moby Dick: Algorithmic Abridgement