The Shakespeare Conference: SHK 23.047 Friday, 3 February 2012
Date: February 3, 2012 8:48:05 AM EST
Subject: Re: SHAKSPER: O Rare
Marie Merkel asks
> Does anyone know of an online resource for discovering
> all the "rare" words . . . within a given play?
About 10 years ago SHAKSPERian Steve Roth did some refinements to a project I started called SHAXICAN. (The name was a gibe at Donald Foster’s supposed SHAXICON database, which was the subject of several articles but never appeared.) The idea was to count rare words in Shakespeare by play and by actor’s part, looking for correlations. Specifically, we wanted to test the hypothesis that the rare words in a particular part acted by Shakespeare himself would appear disproportionately often in the next play he wrote, since those rare words he’d recently spoken on stage would be at the forefront of his mind. That was Foster’s claim but SHAXICAN was unable to verify it.
The files from SHAXICAN are still available at
and the one you want is “correlations.txt” in the “Roth’s refinements” section.
Save it to your own computer, then open it in a spreadsheet program such as Microsoft Excel. (Excel will take you through a ‘Text import wizard’ for handling ‘Delimited data’ files: just accept all the defaults.)
Sort the whole table on the second column, which contains the play names. That’ll give you a table with plays listed alphabetically from 1H4 (=1 Henry 4) to WT (=The Winter’s Tale) in the second column and the rare words in the first column. The third column identifies an actor’s part in another play, which part also contains this row’s rare word. The fourth column gives the number of times this row’s rare word appears in the play identified in the second column and the fifth column gives the number of times this rare word appears in the part identified in the third column. (I’m making it sound more complicated than it is: Roth explains the table with an extract on the website.)
Here, a word is rare if it occurs 1-12 times in the Shakespeare canon. Sorry if that’s too broad a filter for your purposes. For each rare word you can see what part in another play it also occurs in, so the word ‘abundance’ that appears once in 1H4 is listed 9 times at that point in the table, once each for its appearances in 2H4 (twice), AWW (once), COR (twice), JN (once), MV (once), PER (once), and TMP (once). Of course ‘abundance’ appears later in the table too, for each of its occurences in those other plays.
If you want to find words that appear fewer times than 12 in the canon, look for words that appear fewer times overall in the table. You can do this by eye (as you would a printed concordance) or better still someone good at Excel might write you a formula that finds words appearing only once (or any arbitrary number of times) in the table. If there’s a SHAKSPERian who can do that, I’d be interested to share the formula. I teach an undergraduate course on this sort of thing (“The Art of Distant Reading”*) and am somewhat hampered by the fact that good students are better than me at Excel but entirely unfamiliar with real programming. (There is a vigorous debate in the UK about whether the teaching of computers in schools fails to encourage real programming and instead promotes clever uses of Microsoft Office; in my experience it does.)
* My proposed title for the course was “The Art of Not Reading” but this was rejected by my university as likely to bring a department of literature into disrepute. The course titles are indebted to Franco Moretti and Martin Mueller, respectively.