Make a Donation

Consider making a donation to support SHAKSPER.

Subscribe to Our Feeds

Current Postings RSS

Announcements RSS

Home :: Archive :: 2004 :: March ::
Stylometrics
The Shakespeare Conference: SHK 15.0767  Friday, 26 March 2004

[1]     From:   William Davis <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Thursday, 25 Mar 2004 11:27:20 -0500
        Subj:   Re: SHK 15.0750 Stylometrics

[2]     From:   W.L. Godshalk <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Thursday, 25 Mar 2004 15:38:15 -0500
        Subj:   Re: SHK 15.0750 Stylometrics

[3]     From:   W.L. Godshalk <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Thursday, 25 Mar 2004 17:03:24 -0500
        Subj:   Re: SHK 15.0750 Stylometrics

[4]     From:   Ward Elliott <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Thursday, 25 Mar 2004 21:44:47 -0800
        Subj:   RE: SHK 15.0750 Stylometrics


[1]-----------------------------------------------------------------
From:           William Davis <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Thursday, 25 Mar 2004 11:27:20 -0500
Subject: 15.0750 Stylometrics
Comment:        Re: SHK 15.0750 Stylometrics

As an interested observer on the question of stylometrics (though I do
not perform stylometric analysis in my own meanderings through
Shakespeare's texts), I am hoping that someone could enlighten me on the
general state and acceptance or non-acceptance of stylometry.  As I
watch the debates here, as well as many other posts over the months, it
seems that a lot of the issues are dealing with "fine-tuning" issues; in
other words, the way one person counts words, structures, or specific
characteristics in the text, etc., might differ from the way another
person counts the same events, and the battles that rage deal with these
differences in methodologies.  I follow that, and it's interesting to
see the different views.  What I am wondering, however, is how
stylometry is received by the full body of scholarship.  Even though
various researchers in stylometry have different views, would they all
agree that stylometry is a legitimate way to assist in determining
attribution?  Are there scholars who feel that stylometry has no place
whatsoever?

I'm asking, because even though I do not perform stylometry myself, I do
happen to believe that it is a legitimate approach to the text, and I'm
curious to know how it is perceived among Shakespeare scholars.  My
opinion is based on a time when I was trained as a biblical translator
by a man who was, for lack of a better word, brilliant.  He did not work
with Shakespeare, but he certainly worked with biblical passages, both
in the original languages as well as studying the various translations
that resulted from the original texts (he would, for example, count how
many times the word "the" would occur, as opposed to "a" or "an" in the
entire Bible of a specific version).  At one point, he showed me how he
applied his work to a modern writer, and illustrated how the writer's
style developed from an early to a middle and then late period in his
works (the writer - forgive me, but it has been a number of years and I
have forgotten his name - evolved in his use of the words "wherefore"
and "therefore" in the text.  In the early years, the writer used a
ratio of approximately 80/20 in favor of "therefore" - i.e., roughly 80%
of the time, the writer preferred the word "therefore," if given the
option.  As he progressed to his middle period of writing, his use of
"therefore" and "wherefore" became 50/50, and by the late period the
balanced had shifted to where "wherefore" was used roughly 80 percent of
the time, compared to "therefore").

So, this leads to a couple of questions:  for those who do not believe
stylometry has a place in research, what is their reasoning?  (I would
think most everyone would recognize that the style from one writer to
another differs on some level - even if the fine tuning aspects are
subjects of controversy).  And finally, another question for the
stylometrists (is that a proper title?): is there any concession given
in the methodologies of Shakespeare Studies to recognize the possibility
that Shakespeare's style evolved?  How are those variables included and
measured?

Anxious to learn more,
William Davis

[2]-------------------------------------------------------------
From:           W.L. Godshalk <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Thursday, 25 Mar 2004 15:38:15 -0500
Subject: 15.0750 Stylometrics
Comment:        Re: SHK 15.0750 Stylometrics

Sean Lawrence writes to Ward Elliott:

 >We already know that non-Shakespeare plays are outside the corpus of
 >surviving works
 >which your system tests for inclusion in.

Some years ago in an SAA seminar, Mac Jackson suggested that Arden of
Feversham is a play by Shakespeare, and he gave a piece of evidence --
that I do not remember.  Has Ward tested Arden to see if it could be by
Shakespeare?

Bill Godshalk

[3]-------------------------------------------------------------
From:           W.L. Godshalk <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Thursday, 25 Mar 2004 17:03:24 -0500
Subject: 15.0750 Stylometrics
Comment:        Re: SHK 15.0750 Stylometrics

Edward III is another play that has been attributed to Shakespeare -- in
part, if not always completely.  It would be nice to have it tested. If
it has not been already. If it has, please excuse my ignorance, and
supply me with a reference.

But, of course, Sean is correct in pointing out that we would need a new
play by Shakespeare -- one that we knew had been written by Shakespeare
-- and one that passed Ward's test -- to legitimate that test.

Bill Godshalk

[4]-------------------------------------------------------------
From:           Ward Elliott <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Thursday, 25 Mar 2004 21:44:47 -0800
Subject: 15.0750 Stylometrics
Comment:        RE: SHK 15.0750 Stylometrics

This seems to me a commendably civilized trio of responses [by Michael
Egan, Bill Gottschalk and Sean Lawrence].  I'm not looking for extra
business, and I won't even know whether I'm holding aces till I see what
untested play Bill's Syndicate comes up with, and how it tests out.  All
I know is that we've tested 29 "Core Shakespeare" plays with no known
hint of co-authorship, and none of them has more than 2 rejections in
our 48 tests.  We've also tested 51 other-authored plays and 28
anonymous plays which we (now) and most others consider other-authored.
  None has fewer than 7 rejections.  All but two of the 79
other-authored and anonymous plays have ten or more rejections, and the
odds that any of them could have come by chance from Shakespeare's pen
seem to me astronomically smaller than those for the most distant
outlier of our Shakespeare core.  None of our individual tests are like
DNA or fingerprints, with zero false positives or negatives in millions
of tests.  When we aggregate all 48 tests on 108 full-length plays, we
have found zero false aggregate positives or negatives and zero overlap
between Shakespeare and non-Shakespeare.  The resulting odds that
Shakespeare could have written, say, A Yorkshire Tragedy, with 14
rejections, are vanishingly low, far too low for us to hesitate long in
choosing between Duncan-Jones'

Shakespeare ascription and Jackson's and Lake's non-Shakespeare ascription.
These odds are subject to a long list of qualifications, too long to
repeat here, but the major ones are that they are validated only for the
108 plays we tested; that we claim them only for single-authored plays;
and that we have grown increasingly stingy about giving Shakespeare
plays the benefit of the presumption of single-authorship.  This is not
enough to claim the same levels of certainty that we expect from
"fingerprints," "thumbprints," DNA, and so on, and we have never made
such claims, nor used such terms in describing our own work.  But it's
more than enough to make us think we've improved on conventional
ascription methods and more than enough, we think, to bet $1,000 with
quite a bit of confidence that we would win.

I would be surprised (but overjoyed) if someone found a hitherto
unknown, untested, single-authored Shakespeare play to test our ability
to say "could be" to known Shakespeare.  For now, all we can do for that
is to experiment with segregating the Shakespeare outliers retroactively
and seeing how much it changes the profile.  We've done it in several
different ways, and the answer is "not much."  Others, with a little
guidance from us, could do the same kind of exercise with our
spreadsheets; maybe they would arrive at a different conclusion, but I
doubt it.

It's useful to recall that we are negative-evidence people who think our
greatest contribution is not so much identifying Shakespeare as more
clearly identifying non-Shakespeare.  That's what we suspect we're good
at, and it's challengeable and testable by anyone.  It shouldn't be at
all hard to find dozens of known plays that we haven't tested, and even,
maybe, to screen them with our or others' software to find the one that
would win the bet - provided that the Syndicate takes the same care that
we and our students have to commonize the new samples' spelling to the
Riverside Shakespeare standard and, as far as possible, to have the same
person apply every test in the same way at the same time to the
Riverside core baseline to validate the test.  It's the "provided" part,
much more than raising and hazarding $1,000, that seems to me the
biggest hurdle to a well-founded challenge to, or audit of our work.
Ditto for challenging Alfred Hart.  I wish it were easier, but maybe it
wouldn't be as much fun if it were.  For a proper challenge you have to
walk the walk, not just talk the talk, and it takes more to do that than
just think up a different way of counting which yields different
results.   We're ready to help with this if anyone would like to
experiment with out software before the hardware that hosts it melts away.

Our vocabulary-richness tests for Shakespeare used our own counting
conventions, and our counts would probably differ from others', but it
makes little difference because we used the same conventions
consistently.  Our three tests had to do with type/token ratios,
controlled for block size; new-to-the group words, and total imputed
vocabulary estimates, the last two, as far as I know, doable only with
Valenza's Intellex program.  All put Milton at the top, Fletcher at the
bottom, and Shakespeare and everyone else in between.

Yours,
Ward Elliott

_______________________________________________________________
S H A K S P E R: The Global Shakespeare Discussion List
Hardy M. Cook, 
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 
The S H A K S P E R Web Site <http://www.shaksper.net>

DISCLAIMER: Although SHAKSPER is a moderated discussion list, the
opinions expressed on it are the sole property of the poster, and the
editor assumes no responsibility for them.
 

©2011 Hardy Cook. All rights reserved.