The Shakespeare Conference: SHK 29.0423 Wednesday, 5 December 2018
Date: December 5, 2018 at 3:39:19 AM EST
Subject: Re: The Shakespeare Canon and the NOS
I am truly sorry to inflict such a long post on people, but I feel it’s important to respond to several points in Gabriel Egan’s cynical post.
First, if it’s the case that the small sample of text that the ‘micro-attribution’ method works on (such as the 63 words from Macbeth 4.1.143-150) is just too small to test...then it becomes impossible to show that Taylor et al. are wrong in his attributions. Vickers’s argument leads to an admission of defeat—nobody can tell who wrote such a small passage—not to the routing of attributions Vickers doesn’t like.
The point Gabriel misses is that the Authorship Companion he co-edited contains about half a dozen articles that use microattribution. If the technique is unsound, as I have tried to show in my new article, then that removes a substantial chunk of that book.
Pervez Rizvi has followed through this assumption about the extended Kyd canon in a set of essays on his website, using his new dataset and his method of interrogating it. The result is that in fact ‘1 Henry VI’, and ‘King Leir’ most closely match the style of Christopher Marlowe not Kyd, and ‘Edward III’ matches Marlowe’s style if Rizvi uses his 3-gram method and Kyd’s if he uses his 4-gram method.
This is only telling part of the story. I “followed through this assumption” to the extent that I tested it. As Gabriel well knows, I also tested the NOS attribution of Arden of Faversham to Shakespeare. Moreover, I have now provided him and everyone else with the means to do their own choice of tests, as he also knows, so no one is bound by my “assumption”.
One has to really cherry pick Rizvi’s work in order to find in it evidence for the extended Kyd canon. And what is worse, one has to swallow some pretty unpalatable new evidence. Using his 4-grams method, Rizvi finds that: ‘A Midsummer Night’s Dream’ is closest in style to George Chapman’s work; by 3-grams ‘Richard III’ is closest to Kyd’s; by 3-grams ‘The Taming of the Shrew’ is closest to Marlowe’s but by 4-grams that switches to Kyd’s; by 3-grams The ‘Two Gentlemen of Verona’ is closest to Kyd’s; and by 4-grams ‘Henry V’ is closest to Kyd’s.
Again, this is Gabriel cynically telling only some of the truth to readers of this forum. I have done several tests, which I have put online and also shared privately with him and others. After accusing Brian Vickers of cherry-picking, Gabriel does exactly that by selectively quoting some test results that he presumably thinks will discredit what I have been doing. As he well knows, the latest test I did, published last Sunday, assigned all the plays correctly and even assigned scenes 4-8 of Arden to Shakespeare. He also knows that thousands of different tests are possible, as they are with every other attribution method, and an earlier test had given Arden all to Kyd. That is why I have made the method publicly available, so that people can experiment with it independently. He is entitled to reject what I have done, but he should not misrepresent it.
Rizvi has not, and does not claim to have, “published” a database of plays. His website gives the reader a ZIP file containing 510 plays, from which set she has to manually delete 22 plays because they are later than the period we are interested in, leaving 488 plays. Then she has to download 38 Shakespeare plays from the Folger website, bringing the total to 526. Rizvi’s work is based on a set of 527 plays and the 527th, the Additions to The Spanish Tragedy (“counted as a separate little play”), is not provided by him.
I provided everything I had from the now lost SHC site, whether or not I made use of it. It’s almost comical for Gabriel to complain about having to spend the few seconds it takes to delete the 22 files I did not use. As to The Spanish Tragedy Additions, my text of it is the only thing I used that was not freely downloadable by anyone. If Gabriel thinks that’s unacceptable, then he can of course just ignore the results for the Additions.
I usually respect the privacy of private correspondence and never allude to it unless the other person does that first. However, as Gabriel has now taken the gloves off, I am going to make an exception. Readers should know that I have had an intermittent email correspondence with him on this topic since early this year. He has professed a desire to replicate my work. I have answered his questions, and provided the information he has asked for, as best I can. I have always suspected that he was not in good faith and was just fishing for material to discredit my work with, and he has now confirmed it by this post. A few months ago, he asked me for my database. I said I’d provide it and gave him a choice of two formats. He then said that he didn’t want it as he already had all my data. I picked the format I thought would be most useful to people and placed the database file online anyway. Then Gabriel got in touch again a couple of weeks ago, asking for the raw data files I had used. I provided them online for him and everyone to download. There then followed a vexatious correspondence in which he kept asking for my files, even though they were already online. I finally terminated the correspondence last week when we were just going round in circles. Since then I have expected an attack by him on my work and here it is.
I am sorry to have written all that, as it’s never seemly to do this in public, but I think it’s important that when people read Gabriel’s cynical posts, they should understand just what kind of person they are dealing with. It is in that context that readers should understand what Gabriel has written below:
Rizvi’s play scripts are in XML format that records not only the original spelling of each word, but also the lemma to which it belongs. To get at these lemmas, Rizvi must have used some software method that makes sense of the structure of XML files. This method and software he does not provide on his website and just how they work is a non-trivial part of the puzzle. This aspect of Rizvi’s investigation may be perfectly satisfactory, but we cannot know because he hasn’t disclosed it.
This is a deeply cynical attempt to sow doubts in people’s minds. I have provided online all that anyone needs to replicate my work. Of course, it’s “non-trivial”, as Gabriel says. It took me months of patient editing to turn those XML files into a database fit for N-gram searching. I did it because I have the technical know-how, not because I have some secret software that I haven’t “disclosed” to Gabriel. It may well be that when Gabriel looks at those files, he doesn’t know what to do. There’s no reason why he should, since he is an English scholar, not a programmer like me. But that doesn’t mean that a competent programmer wouldn’t know what to do. The tendentious phrase “he hasn’t disclosed it” is of course intended to plant the idea in people’s minds that my work just can’t be replicated because I am concealing information. That is utterly false. Any person with the necessary technical skill could replicate what I did, and thousands would no doubt do it better or faster.
A great deal rests on the detail of the means by which one extracts the verbal matches that are the evidential base for this kind of investigation...[long snip]...Do any of these choices account for his much greater counts for n-gram matches than other investigators have found? We don’t know, but I do not assume that these details are trivial.
I have no objection to this kind of critical comment. As I wrote here before, we need to do the experiments to answer questions like these. I did something about that need, in my spare time and without any funding, and I gave it all away gladly, so people could see what’s possible. I have said more than once in the past year that I hope some university department will do the work better, having my work before them to provide ideas for improvement, and correcting whatever errors I made.
Taylor and the others cited by Vickers use the Literature Online (LION) database as their source texts and as their searching software. This has the signal merit that almost everyone in academia has access to it and can reproduce the results that the investigators claim.
This is Gabriel's second attempt recently to plant the idea in people’s minds that they should stick to LION and ignore my work. As he well knows, but does not say, LION is not a substitute for my published data, and vice versa. LION does not let you specify a play and give you all its N-gram matches with every other play. You need different tools for different kinds of research. LION does not make possible the vast amount of as yet undone research that my data makes possible.
We are far from having the right tools to begin asking the right questions about the authorship questions we’d like to answer...and when the details of the datasets, methods, and tools are not available for all to see, it is unsound to conclude that one approach as trumped the other.
This is a standard Egan tactic when he wants to divert attention, to present himself as a wise scholar just asking for adherence to good practice and awaiting better information. (Incidentally, as will become clear soon in a published work, there is at least one researcher who asked for some information necessary to replicate some of the tests in the Authorship Companion and was told by the author of the chapter that it’s unavailable.)
I suspect few people are interested in these details, any more than they were interested in Gabriel’s diversion about URLs a few weeks ago. Why does he do this? Because in the course of this year, papers published and about to be published, by me and others, have demonstrated just how bad a book the Authorship Companion is. When those papers become more widely known - which is Brian Vickers’ aim here - it will be clear to all just how incompetent the research was on which the NOS edition is based. As one of the co-editors of the book, Gabriel has to find ways to shoot the messengers. As Darren Freebury-Jones observed here, Gabriel’s reviews of other people’s work are ruthlessly partisan: he praises his colleagues’ work and rubbishes the work of people who disagree with him. Readers should bear that in mind as they read his future attacks and diversions.