Mary Shelly’s gothic horror story Frankenstein is a work of literature so classic that even people who have never read it are still familiar with the archetype of a mad scientist constructing an artificial man from bits of corpses and chunks of meat. The creature that results from this experiment is so horrific that even its own creator abandons it. Forced to fend for itself the creature experiences tragedy after tragedy until it is driven mad by the cruelness of the world and becomes consumed with revenge, leading to a spiral of doom and destruction for Frankenstein, his creation and everyone around them.
Needless to say, it’s a pretty good book if you’re in the mood for a melancholy tragedy where emotions run hot and characters wax poetic about the doom and gloom that haunts their lives after all their best intentions go horribly wrong.
It also brings up some interesting questions about nature versus nurture and scientific ethics. But I’m not philosopher, so I’ll be skipping those weighty head-scratchers in favor of some straight forward mathematics!
Let’s start our statistical dissection of Frankenstein with some simple metrics:
Word Count: 72,326
Average Word Size: 4.4 letters
Median Word Size: 4 letters
Longest Word: incomprehensible (16 letters)
Sentence Count: 3,212
Average Sentence Length: 22.5 words
Median Sentence Length: 21
Longest Sentence: 91 words
I paused, examining and analysing all the minutiæ of causation, as exemplified in the change from life to death, and death to life, until from the midst of this darkness a sudden light broke in upon me a light so brilliant and wondrous, yet so simple, that while I became dizzy with the immensity of the prospect which it illustrated, I was surprised that among so many men of genius, who had directed their inquiries towards the same science, that I alone should be reserved to discover so astonishing a secret.
These numbers were generated by downloading a public domain copy of the original 18181 version of Frankenstein from Project Gutenberg and then using the spaCy Python library to split the book apart first into sentences and then into individual words. I then did my best to filter out punctuation so that “!” wouldn’t be counted as a word. I did NOT, however, exclude things like chapter headers.
So while the end result may not perfectly match the numbers a human would come up with during a manual review of Frankenstein it should be close enough for all practical purposes.
I will admit this post didn’t uncover anything particularly groundbreaking, but it helps set the stage for the rest of our analysis.
1Marie Shelly republished the book in 1831 with a moderate number of revisions meant to make it more appealing and less shocking to the general public. There’s quite a bit of debate on the merits of each version but I have no real opinion and chose the first publication more or less on a whim.