Category: Statistical Nonsense

Moby Dick: Sentence Uniqueness

Time to ask the question: What is the most unique sentence in all of Moby Dick? What sort of writing can be found here and nowhere else? Like usual we approach the question from two different angles: 1) If we…

Continue Reading Moby Dick: Sentence Uniqueness

Moby Dick: Word Frequency Analysis

The vocabulary in Moby Dick leans heavily towards whaling terms. There are ships and sails. Oars and harpoons. Tides and waves. Blubber and oil. Most of all there are whales; over a thousand mentions of them. This is not surprising….

Continue Reading Moby Dick: Word Frequency Analysis

Moby Dick: Basic Statistics

Herman Meliville’s Moby Dick, originally just named The Whale, is a famously philosophical and famously long swashbuckling adventure story about an eclectic group of whale hunters who sign up for what they think is going to be an ordinary three…

Continue Reading Moby Dick: Basic Statistics

Pride and Prejudice: Sentence Normalization

Pride and Prejudice is both two centuries old and English, making it an excellent subject for an exercise in normalization. Time to see what happens when we algorithmically replace words with their most commonly used synonyms. Original: Mr. Bingley was…

Continue Reading Pride and Prejudice: Sentence Normalization

Pride and Prejudice: Sentence Uniqueness

Now that we’ve examined some of the individual words that best define Pride and Prejudice let’s try to identify the sentences with the most unique word choices. To start we will look for sentences which are relatively unique. We calculate…

Continue Reading Pride and Prejudice: Sentence Uniqueness

Pride and Prejudice: Word Frequency Analysis

Pride and Prejudice is over two hundred years old and so it shouldn’t be too much of a surprise that it uses quite a few (85 by my criteria) “old fashioned” words that rarely appear in modern english but were…

Continue Reading Pride and Prejudice: Word Frequency Analysis

Pride and Prejudice: Basic Statistics

Jane Austin’s Pride and Prejudice is pretty much the golden standard for what defines an enjoyable romance. An intelligent, beautiful and well-off woman meets an intelligent, handsome and ridiculously rich man. They get off on the wrong foot, slowly come…

Continue Reading Pride and Prejudice: Basic Statistics

The Hound of the Baskervilles: Doge Translation

such mystery much clue very elementary Watson The Hound of the Baskervilles may not have a canine main character in the same way Call of the Wild did, but it does have a definite dog theme to it and that…

Continue Reading The Hound of the Baskervilles: Doge Translation

The Hound of the Baskervilles: Sentence Uniqueness

Identify individual words that uniquely define a book’s vocabulary is fairly easy. But what about doing sentence level analysis? What if we wanted to find “the most unique sentence” in the entire book? The primary challenge here is deciding just…

Continue Reading The Hound of the Baskervilles: Sentence Uniqueness

Hound of the Baskervilles: Sentence Normalization

Having now talked a little about what makes The Hound of the Baskervilles unique let us once again see what we can do to remove that uniqueness by automatically replacing words with their most dull and common modern equivalents. Original:…

Continue Reading Hound of the Baskervilles: Sentence Normalization