The Shakespeare Conference: SHK 28.014 Thursday, 12 January 2017
Date: January 11, 2017 at 5:03:02 PM EST
Subj: Re: SHAKSPER: Co-Author
Date: January 11, 2017 at 7:50:50 PM EST
Subj: Re: Co-Author
Date: January 11, 2017 at 5:03:02 PM EST
Subject: Re: SHAKSPER: Co-Author
Larry Weiss writes
> The debate, if there is one, between Gabriel Egan
> ... and Sir Brian Vickers ... over whether lexical or
> function words are of greater significance in making
> or rejecting attributions seems not to recognize that
> they are both important.
I don’t know where Larry gets this impression from. In my SHAKSPER posting I merely defended the study of function words against unfounded claims that they tell us nothing about authorship. In our Shakespeare Quarterly article, we write that “Ideally, for each text we would count the proximity of every word to every other word, to capture the phenomenon of word-clustering at all levels—among rare words and frequent ones—wherever it occurs”. That’s exactly the point Larry claims that I don’t recognize.
Larry says that Hugh Craig’s work on rare-word and common-word frequencies (which is based on the foundational Delta and Zeta tests invented by John Burrows) is important for corroboration of hypotheses by independent means. I agree. The forthcoming ‘New Oxford Shakespeare: Authorship Companion’ (edited by Gary Taylor and me) will contain one essay by John Burrows and Hugh Craig and another by Hugh
Craig alone that help explain the authorship claims made by the edition.
Larry rightly insists that all sorts of aspects of authorial style ought to be investigated, so I hope he’ll be pleased to hear that the General Editors of the New Oxford Shakespeare agree and as a consequence the Companion will also have essays on the evidence from metrical habits (by Marina Tarlinskaja amongst others) and on the latest techniques including the measurement of Shannon Entropy and data analysis by the use of Random Forests and Nearest Shrunken Centroid.
Pervez Rizvi was kind enough to say that our Shakespeare Quarterly article “does a superb job of explaining the mathematics in a way that non-mathematicians could understand” and since as co-editor of the Companion I took the same care to ensure that it too has this desideratum, I hope he finds that it does.
Pervez raises a most pertinent point about our decision to count only adjacencies within a speech and to ignore those that span a change of speaker. That’s what we do in the Shakespeare Quarterly paper. In the more technical paper, we’re concerned with non-Shakespearian prose writings and the segmentation is by sentences, not speeches. I’m sorry that Pervez got the impression that sentence boundaries are also respected in the Shakespeare Quarterly paper: they are not.
But should we respect speech boundaries when looking for function word adjacencies? On reflection, probably not. The thinking behind this decision is that at some boundaries we have to assume that Shakespeare’s mind was interrupted by a natural break in the writing. Because a scene break is, by definition, the occasion for a change of people on stage and/or the location in which the action takes place, it seems unlikely that Shakespeare would still be dwelling (consciously or unconsciously) on the words used at the end of the last speech of the old scene when composing the first words of the first speech of the new scene.
Between speeches a similar interruption can be caused by intervening stage directions, of course. So there won’t always be a ‘flow’ between speeches. But then again, most speeches do not have stage directions between them and stage directions can occur within speeches as well as between them.
Because our moving ‘window’ of consideration is only five words wide, and we give logarithmically diminishing weighting to the words near the far end of that window, the decision to segment at speech boundaries does not in the end make much difference to the results. Most speeches are significantly longer than 5 words.
But Pervez also raises the pertinent point of changes in Shakespeare’s writing across his career: if early Shakespeare is considerably unlike late Shakespeare on the feature we’re measuring then one not should derive a single Shakespearian profile that lumps them together. There is more work to be done here, especially (now that I think on it) because as Helmut Ilsemann has shown the average length of speeches in Shakespeare dropped sharply around 1599 from about 10 words to about five. That does indeed make the segmentation question that Pervez raises particularly important. I’m grateful to him for raising it and we’ll be debating this in our team.
Finally, Pervez is exactly right to acknowledge that the press reporting of scholarly work is never as nuanced as the scholars would like. The caveats he is “sure Gabriel will concede” are indeed ones I concede and they are not present in the press release.
Date: January 11, 2017 at 7:50:50 PM EST
Subject: Re: Co-Author
They need to collaborate with their colleagues in the mathematics and computing departments, to invent and test methods like the one in the SQ article. That is the task for this generation.
Computers aren’t new. Computer-aided studies of function words have been around for ages, and the evidence of the arxiv paper by Segarra, Eisen and Ribeiro shows essentially the same thing: that function words might be useful for attribution if you have a large number of texts (100,000 words) of a known author to serve as a basis, and if the texts you are testing aren’t too short (less than 10,000 words). Thus the statement in the online article (“Shakespeare and his co-authors, as told by Penn engineers”) that “Analysis of Shakespeare’s author profile suggests that he was not the only author of three “Henry VI” plays, which were most likely a collaboration between Shakespeare and Marlowe or Peele” is not believable, because there are only 4 known plays by Peele, as the authors themselves point out, and the seven Marlowe plays are in the same or similar genre of history, and, again as the authors themselves point out, works of similar genre have more similarities. It would also be helpful if one of the people involved would stop being coy and tell exactly what sections of the H6 plays are not by Shakespeare. I would be happy to explain exactly why they are all by Shakespeare! It’s always vastly amusing to me that the claims of alternate authorship by these means always neglect to give actual passages, instead of relying on abstract word counts. They also never want to explain how or why Marlowe would have been involved in these collaborations, and why no one ever wants to attribute part of play by say, Greene, to Marlowe, it’s always Shakespeare; for some reason the greatest writer in the English language is always the one who needs the helping hand. You also have to start asking awkward questions like “Did Marlowe only write seven plays because he was so busy helping Shakespeare? Didn’t he want a career of his own?”.
Here are some facts concerning the use of the word “for” in some of Shakespeare’s plays, including 2H6. I compare act 4 of 2H6 (the scene with “Let’s kill all the lawyers” that is claimed to be by Marlowe) with similar sized scenes in Act 4’s of other plays, early, middle and late.
scene 1 147 lines 10 times 0.068 times per line
scene 2 190 lines 22 times 0.1156 per line
scene 7 136 lines 14 times 0.102 times per line
Comparing only the longer scenes with scene 2, scenes 1 and 7 differ from scene 2 by 41% and 12% respectively.
Other acts in 2H6 that are about 190 lines (scene, #lines, # of “for”, “for”/line):
1.3 220 24 0.109
2.1 201 11 0.055
5.1 216 16 0.074
an almost 50% difference between the highest and lowest values.
Taming of the Shrew
scene 2 121 lines 15 times 0.123 per line
scene 3 196 lines 13 times 0.066 per line
A difference of 46% with each other, and these acts are both above and below the rate of similar sized scenes in 2H6, and the average, 0.094, is close the rate in scene 7 of 2H6. If someone wanted to use this data, they could say “Scene 2 of 2H6 isn’t like scene 3 of Taming of the Shrew”, must not be by the same author!”, while another could argue “Scene 2 of 2H6 is like scene two of Taming of the Shrew, must be by the same author!”.
Here are all the acts in Taming of the Shrew that are about 190 lines like scene
2 of 2H6:
4.1 211 lines 12 0.056
4.3 196 lines 13 0.066
5.2 189 lines 21 0.111
A 50% difference between the highest and lowest rates there, and they look pretty similar to the rates in the acts from 2H6, not too surprising since they are both early plays.
What about Hamlet? Ah, well, Hamlet might have been written by [Oxford, Derby, Marlowe, Bacon, Nostradamus, Paracelsus, King Arthur, Beyonce...take your pick] so maybe my comparison isn’t valid, but let’s see anyway:
scenes 3&4 combined 134 lines 10 times 0.0746 per line
scene 7 194 lines 8 times 0.041 per line
A difference of 45%.
In the brief scenes 1&2 of act 4 of Hamlet, 45 and 31 lines respectively, “for” occurs only once each, while in another brief scene, scene 6, 33 lines, it occurs 8 times, a huge variability (0.022, 0.032 and 0.24 per line respectively, a roughly 1000% difference).
Other acts in Hamlet that are about 190 lines:
1.1 175 7 0.040
1.5 190 6 0.032
3.1 188 14 0.074
3.4 217 14 0.065
4.5 211 13 0.062 (Varying by 57%.)
combining 1,3 & 4, 125 lines, 13 times, 0.104 per line.
scene 2 403 lines, 24 times, 0.060 per line.
I found other scenes in Cymbeline that were roughly 190 lines long like 4.2 2H6:
1.1 179 lines 10 0.056
1.4 172 lines 10 0.058
1.6 200 lines 17 0.085
3.4 193 lines 9 0.047
5.4 206 lines 16 0.077
a difference of 45% between the highest and lowest rates there.
So the rate of commonplace (“function”) word use varies from scene to scene by a considerable margin, making “function” words useless in attributing relatively short stretches of text like a scene to anyone. But, of course, this variability from scene to scene of the function word frequency makes them an ideal playground for alternative authorship cranks, who can pick and choose what words and what acts they want to assign to their favorite hobby horse, whether it’s Oxford or Marlowe or whomever. If you want, do it the Vickers way: just make up a bunch of tests, combining say, “for” with the word “night”, and “or” with “black”, whatever you want, make up twenty or so. Then count them in Shakespeare and in your favorite hobby horse, and guess what? 3 or 4 of them will be close in frequency to your hobby horse’s frequency. Don’t mention the 16 or 17 tests that didn’t match. This will be especially effective if you throw in some sophisticated looking statistical tests with lotsa jargon ‘n stuff. Then you can proudly proclaim “It was actually the Earl of East Armpit who wrote Shakespeare!” and the newspapers will come running. Gare-on-teed!