The Shakespeare Conference: SHK 24.0338 Monday, 15 July 2013
Date: July 13, 2013 9:11:35 PM EDT
Subject: RE: SHAKSPER: Stylometrics?
Here’s a response to Jed Serrano’s request for a user-friendly description of stylometrics. Some people suppose that there are as many definitions of stylometry as there are stylometrists. I see stylometrics as a way to find measurable, consistent, distinctive features of a given author’s work which permit you to distinguish it from the work of other authors. That has been for many years the North Star of Robert Valenza’s and my new-optics Claremont Shakespeare Clinic. Our specialty was sophisticated measures of stylistic discrepancy. “Measurable” usually means “quantitative.” “Quantitative” nowadays usually involves computers. Many old-school Shakespeare pros and buffs still find computers a turn-off, but the well of new Shakespeare documents ran dry a century ago, and, as with so many other things, computers have become a game-changer. They have made stylometric comparison and verification a thousand times faster, easier, and more productive than before. If you want to know more about what Shakespeare wrote and didn’t write, computers are where the action is today. I have the notion that many younger scholars, who have grown up taking for granted range-extenders like the Internet, Facebook, iTunes, Wikipedia, and SHAKSPER, are more open than their elders, if not to practicing stylometrics themselves, at least to finding out what stylometrics can tell us about who wrote what, and when. I hope this notion is so.
I’ve written a short, user-friendly introduction to some major recent stylometric developments and tools for the forthcoming Cambridge World Shakespeare Encyclopedia, “Language: Key to Authorship.” I think it is due to come out very soon but I don’t have an arrival date. The article is too long and packed to summarize in detail here, but It discusses various stylometric techniques worth mentioning: equivalent word choice (while or whilst); intensifiers (most, very); aversions (hark!); prefixes and suffixes; more-preferred and less-preferred words (”badges and flukes”); new words and rare words; combinations and collocations (I do beseech); LION links; incongruous who’s; redundant comparatives and superlatives (most unkindest); hendiadys; verse tests; modal analysis; and even “enhanced intuition:” aggregated, intuitive identification by a screened and validated panel of “Golden Ears.” It’s far from exhaustive, just five or ten printed pages, I would guess, but enough to get you started.
If you don’t want to wait for the CWSE, or would like to go beyond it, here are a few favorite stylometrics references:
Anything by MacDonald Jackson, dean of the world’s early-modern stylometrists. For recent examples, try his “Early Modern Authorship: Canons and Chronologies.” in Thomas Middleton and Early Modern Textual Culture. G. Taylor, and Lavagnino, John. Oxford: Oxford University Press: 80-97, 2007, or his short, very accessible “Authorship and the Evidence of Stylometrics” in Edmondson and Wells, eds., Shakespeare Beyond Doubt, CUP, 2013.
Two monumental books by Sir Brian Vickers, using both traditional and stylometric methods to very good effect for Shakespeare’s poems and plays: ‘Counterfeiting’ Shakespeare, CUP 2002, and Shakespeare, Co-Author, OUP, 2002.
Hugh Craig and Arthur Kinney, eds, Shakespeare, Computers, and the Mystery of Authorship. CUP, 2009. Deploys new, cutting-edge computer techniques with up to 98% claimed accuracy in distinguishing works of known authors from each other.
Two studies by Gary Taylor, “The Canon and Chronology of Shakespeare’s Plays.” in William Shakespeare: A Textual Companion. S. W. Wells, Taylor, Gary, et al., OUP, 69-144, 1987, and “Shakespeare and Others: The Authorship of Henry the Sixth, Part One.” Medieval and Renaissance Drama in England 7: 145-205, 1995.
The fullest, most accessible account of our own methods, aimed at a lay audience and published in a law review, not a technical journal, is Elliott and Valenza, “Oxford by the Numbers: What are the Odds that the Earl of Oxford Could Have Written Shakespeare’s Poems and Plays?” Tennessee Law Review 72(1): 323-453, 2004. http://www.cmc.edu/pages/faculty/welliott/UTConference/Oxford_by_Numbers.pdf. Answer: lower than the odds of getting hit by lightning. We, too, claim accuracy rates of 95% or better, in samples over 1,500 words, 100% for whole plays.
I know that there are many numbers-shy people who care what Shakespeare wrote, but don’t want to struggle with statistics. They would much rather have a simple, concrete, proxy indicator of relative community confidence in one method or another. In our case, there is one. It is the underwhelming response to our £1,000 bet, offered on SHAKSPER nine years ago and still available, that our methods would correctly classify any not-yet-tested play of the taker’s choice as Shakespeare or non-Shakespeare. Not everyone approved of our offer, but no one has taken us up on it. One person did offer a counterbet, which he lost. http://shaksper.net/archive/2011/303-september/28127-thomas-woodstock-
Most of our tests are highly replicable, and there are hundreds of early-modern plays out there that we haven’t tested. In principle, it would be easy to prove us wrong simply by pre-testing these until you found just one that gets misclassified, accepted the bet, and collected the thousand pounds. No one has made that effort. It certainly doesn’t show that our methods actually would be 100% Shakespeare-accurate on plays we haven’t tested, only that nobody doubts it strongly enough to try to refute it, not even for an “easy,” risk-free thousand pounds. It’s a proxy, crowd-sourced indicator, not a real proof, and crowds aren’t always right. But such proofs-by-proxy are surprisingly common in authorship and other debates because they are much easier for most people to understand than a real, technical proof. See Kahneman, Thinking Fast and Slow, 2011, Ch. 7-9. You don’t have to master stylometrics to notice when nobody has enough confidence in the negative case to pursue it, even for a thousand pounds, and to infer, if you wish, that the negative case was not strong enough to persuade even its most vocal exponents. How could it persuade anyone else?
It is true that stylometrics hasn’t yet answered all the most interesting questions. Not all stylometrists agree with one another on all points. Many authorship mysteries remain, and many points of legitimate difference are still unresolved. But stylometrics have come a long way since 1966, when Samuel Schoenbaum described them as “thousands of pages of rubbish.” They complement, not replace, traditional methods; they have already greatly refined our understanding of who wrote what since Schoenbaum’s time – remember the battles over the Funeral Elegy? – and they seem to me to offer the most promise of future progress. If I wanted to solve some of the remaining mysteries and settle some of the remaining differences, stylometrics would be my first tool of choice.