The unique sentence results for Cuchulain wound up being an interesting lesson in the importance of proper pre-processing when doing natural language processing. While the relative frequency results are interesting enough the global frequency results were basically useless. See for yourself:
Global unique sentence
THE RED ROUT 270 NOTES ON THE SOURCES 275 Illustrations PAGE THE RAVEN OF ILL-OMEN _Frontispiece_ QUEEN MEAVE AND THE DRUID 18 CUCHULAIN SETS OUT FOR EMAIN MACHA 28 CUCHULAIN DESIRES ARMS OF THE KING 42 MACHA CURSES THE MEN OF ULSTER 80 FERDIA FALLS BY THE HAND OF CUCHULAIN 140 “THE MOMENT OF GOOD-LUCK IS COME” 160 CUCHULAIN COMES AT LAST TO HIS DEATH 268 Introduction The events that circle round King Conor mac Nessa and Cuchulain as their principal figures are supposed to have occurred, as we gather from the legends themselves, about the first century of our era.
Global mean unique sentence
With Eight Illustrations by Stephen Reid “Bec a brig liomsa sin,” ar Cuchulaind, “gen go rabar acht aonla no aonoidchi ar bith acht go mairit m’airdsgeula dom és.” _
Two obvious problem: Those aren’t really sentences and even if they were properly separated into sentences they aren’t really part of the book, they are image captions. Automatically detecting this sort of issue is tricky but if we were doing a serious job of analyzing just this book removing them manually before running our code would have been simple enough.
All that said their extremely high uniqueness score has an obvious cause: They’re both either in a foreign language or full of foreign words. Of course those sentences seem unique compared to modern English, or any sort of English.
Let’s take a look at the relative word frequency results for some better data:
Max unique sentence
That night there was no cheerfulness nor gaiety nor quiet pleasure in the tent of Ferdia, as there was wont to be on other nights; for he had made known what Meave had said to him and the command laid upon him to go on the morrow to combat with Cuchulain; and though Ferdia was merry and triumphant on his return, because of the gifts of the queen and the affection of Finnabar, and all the flattery that had been skilfully put upon him, it was not so with the men that were of his own household, for they understood that wherever those two champions of battle, those two slayers of a hundred should meet together, one of the two must fall, or both must fall: and well they knew that if one only should fall there, it would not be Cuchulain who would give way, for it was not easy to combat with Cuchulain on the Raid of the Kine of Cooley.
Mean unique sentence
Moreover, thou hast brought with thee no strong comrade and warrior to protect thee from my blows.
It’s as you would expect. “Thee”s and “Thou”s and other such antiquated languages help this book stand out and give it it’s legendary feel.