Make a Donation

Consider making a donation to support SHAKSPER.

Subscribe to Our Feeds

Current Postings RSS

Announcements RSS

Home :: Archive :: 2005 :: July ::
Shakespeare by the Numbers
The Shakespeare Conference: SHK 16.1251  Thursday, 28 July 2005

[1]     From:   Jack Lynch <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Monday, 25 Jul 2005 11:06:41 -0400 (EDT)
        Subj:   Re: SHK 16.1241 Two Questions

[2]     From:   Ward Elliott <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Monday, 25 Jul 2005 13:26:32 -0700
        Subj:   RE: SHK 16.1241 Shakespeare by the Numbers

[3]     From:   Eric M. Johnson <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Tuesday, 26 Jul 2005 11:25:27 -0400
        Subj:   Re: SHK 16.1241 Two Questions


[1]-----------------------------------------------------------------
From:           Jack Lynch <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Monday, 25 Jul 2005 11:06:41 -0400 (EDT)
Subject: 16.1241 Two Questions
Comment:        Re: SHK 16.1241 Two Questions

Ross Clement writes:

 >I'd like to ... find out more about the arguments against
 >"reducing Shakespeare to numbers", and understand more about
 >arguments against. In particular, if there are books, journal
 >articles, and/or conference articles arguing the case against
 >"reductionism", I'd very much like to read them so that I can
 >understand more of the issues involved. Any recommendations?

It doesn't address Shakespeare at very great length, but Harold Love's
_Attributing Authorship: An Introduction_ (Cambridge: Cambridge Univ.
Press, 2002) offers a sympathetic overview of many approaches to
attribution, including computerized approaches.  It does a good job of
summarizing the strengths and weaknesses of various techniques.  It's
also very readable.

[2]-------------------------------------------------------------
From:           Ward Elliott <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Monday, 25 Jul 2005 13:26:32 -0700
Subject: 16.1241 Shakespeare by the Numbers
Comment:        RE: SHK 16.1241 Shakespeare by the Numbers

Ross Clement asks for recommendations for reading on whether "reducing
Shakespeare to numbers" can help determine authorship.  Rob Valenza and
I think it does and are working on a book entitled Shakespeare by the
Numbers. There was a big dispute on this very topic on SHAKSPER two
years ago between us and a SHAKSPER correspondent who strongly felt
that, though numbers may tell you something about the known, like the
stock market yesterday, they can tell you nothing about the unknown,
like the stock market tomorrow.  This discussion ended abruptly when we
offered him a $1,000 even-odds wager that he could not find an untested
Shakespeare-era play not by Shakespeare which would pass our composite
Shakespeare tests as a Shakespeare could-be.  See the King John, Peele,
Titus thread, especially SHK 14.1105, 6 June 2003 and SHK 14.1244, 23
June 2003. The correspondent wisely declined.

If Mr. Clement or other SHAKSPER correspondents would like more detail,
I would refer him to our 130-page article in the Tennessee Law Review,
"Oxford by the Numbers," due from the publisher in a few days. 100 pages
of it describe our by-the-numbers methodology; the remaining 30 pages
apply it to the poems of the Earl of Oxford and conclude that "The odds
that either [Oxford or Shakespeare] could have written each other's work
are much lower than the odds of getting hit by lightning."  Among other
things, the article makes the same $1,000 wager offer to the world at
large, not excluding SHAKSPER correspondents. It also discusses the
potential risks and rewards from our viewpoint and makes some
suggestions as to how any taker, with whatever help we could provide,
could best screen the scores of plays we haven't tested for a likely
winner.  And we added this paragraph:

"Just as important as our willingness to bet on the predictive powers of
our findings is the fact that our rules are so tight, quantified, and,
hence, replicable, that our prediction would be eminently testable and
falsifiable.  If anyone takes us up on our bet with or without
screening, it will not be difficult to tell who won or lost.  Can this
be said of any other composite authorship-identification system now on
the market?  We would not bet on it."

I have earlier counseled SHAKSPER correspondents interested in such
matters to pre-order a copy of the Tennessee Law Review, v. 72, Fall
2004, with a symposium on Shakespeare authorship.  It's now too late to
pre-order, but they may have some copies left. If so, they are still a
bargain at $12.  I shall try to put our article on the web if I can get
the Acrobat file from the publisher; in the meantime, my Power Point
slides from the conference can be seen at
http://govt.claremontmckenna.edu/welliott/UTconference/

We have since raised our wager from $1,000 to a thousand pounds, in an
article on the "Shakespeare" scenes of Edward III and Sir Thomas More
just submitted to the Shakespeare Yearbook. The new offer takes account
of the decline of the dollar, gives our wager a more Shakespearean ring,
and, we hope, increases the incentive to take us up. This offer, too, is
open to SHAKSPER members, and it's open now. You don't have to wait for
the SYB to appear to take it on.

I should not leave the impression that we think that authorship by the
numbers supersedes authorship by the documents-it doesn't-nor that
numbers cannot be misinterpreted-they can, and so can documents- nor
that getting the numbers down can solve every problem so convincingly
that you can bet the mortgage on it.  It doesn't.  The short
"Shakespeare scenes" in our SYB article are still tough nuts to crack,
in a way that whole plays are not. But numbers can cast new light on
many authorship corners that would otherwise be dark.  If authorship
matters-and we think it does-stylometric methods like ours for testing
it should not be ruled out.

Yours,
Ward Elliott
Burnet C. Wohlford Professor of American Political Institutions
Claremont McKenna College

[3]-------------------------------------------------------------
From:           Eric M. Johnson <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Tuesday, 26 Jul 2005 11:25:27 -0400
Subject: 16.1241 Two Questions
Comment:        Re: SHK 16.1241 Two Questions

Regarding the first question about computerized textual analysis to
determine authorship, if you really wanted to test the software, you
would have to set up a controlled experiment where you knew the
authorship ahead of time, and thus you could evaluate the program's
output properly. Such an experiment could work like this:

1. Gather new documents written by a group of individual authors, as
well as collaborative documents produced by two or more of those
individuals. The latter documents should be tagged to indicate who wrote
what, but those tags should not be readable by the software. (These tags
could be generated by converting the "track changes" data in a Microsoft
Word file, for example.)

2. Perform the initial training of the software by identifying the
documents produced by the individual authors, so it could learn the
verbal "fingerprint" of each author.

3. Run the collection of documents through the software mechanism so it
can flag attributions.

4. Compare the software's authorship attribution to the original, tagged
documents.

5. Tweak the software's settings when it misidentifies the authors.

6. Go back to step #3 and repeat the cycle until you are satisfied that
the software is identifying the authors to a reasonable degree of accuracy.

This is how you use concept-extraction software, and authorship software
should work in a similar way. My own experience, in evaluating
concept-extraction software for the private and public sectors, is that
you can get it to do the "gross" analysis -- that is, it can see that a
document with eight references to "Tony Blair" and three to "Parliament"
should be associated with the concept of "U.K. Politics." You can even
train the software to use context as a clue, so (in theory) if you
trained the software properly, it could distinguish between "Clinton,"
the British general, "Clinton," the American president, and "Clinton,"
the leader of the band Parliament Funkadelic. The final list of
concepts, though, has to be vetted by an editor, who will add or delete
as he sees fit. I imagine that authorship software would have to work
the same way.

Once you were confident that the authorship software worked in your
controlled experiment, you could then unleash it on texts where you
could not absolutely confirm the authorship. You would train it by
identifying texts you could attribute to Shakespeare reasonable
certainty, as well as texts by other authors you wanted to identify, and
use them to train the software.

However, here's a question worth asking: could such software mark
something as "unidentified"? In other words, if you had three possible
authors as possibilities, could it mark a passage as "none of the
above"? That sounds tricky, but maybe someone has figured out how to do
it. Color me skeptical.

Regards,
Eric Johnson
http://www.opensourceshakespeare.org

_______________________________________________________________
S H A K S P E R: The Global Shakespeare Discussion List
Hardy M. Cook, 
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 
The S H A K S P E R Web Site <http://www.shaksper.net>

DISCLAIMER: Although SHAKSPER is a moderated discussion list, the
opinions expressed on it are the sole property of the poster, and the
editor assumes no responsibility for them.
 

©2011 Hardy Cook. All rights reserved.