Make a Donation

Consider making a donation to support SHAKSPER.

Subscribe to Our Feeds

Current Postings RSS

Announcements RSS

Home :: Archive :: 2004 :: March ::
Stylometrics
The Shakespeare Conference: SHK 15.0677  Monday, 15 March 2004

[1]     From:   Thomas Larque <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Friday, 12 Mar 2004 13:05:07 -0000
        Subj:   Re: SHK 15.0663 Stylometrics

[2]     From:   Ward Elliott <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Friday, 12 Mar 2004 13:31:52 -0800
        Subj:   RE: SHK 15.0663 Stylometrics


[1]-----------------------------------------------------------------
From:           Thomas Larque <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Friday, 12 Mar 2004 13:05:07 -0000
Subject: 15.0663 Stylometrics
Comment:        Re: SHK 15.0663 Stylometrics

Michael Egan's warning seems well given.  I wonder if the texts that
were studied may also have influenced the counts?  In the case of plays
that have two or more authoritative Renaissance texts (usually Folio and
one or more Quartos) the numbers will presumably differ depending on
edition and sometimes even copy chosen, especially if one or both of the
counts are carried out on modernised and conflated editions (which could
be very different in their content from the Renaissance printed texts).
  I would be interested to know which editions Michael Egan was using,
and whether we know that these were the same ones used by Hart.

Thomas Larque.
"Shakespeare and His Critics"       "British Shakespeare Association"
http://shakespearean.org.uk           http://britishshakespeare.ws

[2]-------------------------------------------------------------
From:           Ward Elliott <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Friday, 12 Mar 2004 13:31:52 -0800
Subject: 15.0663 Stylometrics
Comment:        RE: SHK 15.0663 Stylometrics

Michael Egan's criticism of Alfred Hart, and of Brian Vickers for citing
him, seems to me an unfortunate example of the "my way" fallacy.  You
need more than discrepancies between one person's counts and another's
to dismiss the first person's results as "not always reliable" and "not
good enough for a methodology claiming the authority and precision of a
science."   Everyday experience tells us that that even simple, manual
word counts by the same person don't come out the same every time when
the numbers get high.  Most people most of the time can count to 50
without losing track, but tedium and distractions do accumulate,
attentions wander, and few people, having struggled through a manual
count in the hundreds are eager to go back and do it again three times
to make absolutely sure they didn't overcount or undercount.   My hat is
off to people like Alfred Hart and the other Iron Men of pre-computer
stylometry for grinding through these long counts again and again and
again, with more than enough accuracy to make their points.

When computers came, stylometrists and others were overjoyed to have
access to simpler, faster, more replicable word counts than ever before,
and only modestly dismayed to find that, though one word-counting
program always got the same total every time,  two programs would
generally use different counting conventions and get different,
occasionally strikingly different totals.   The problems posed by
inter-program nonreplicability were easily solved by using the same
counter for every test.

What if the test involves something more subtle than defining a word,
such as distinguishing between a real un- word like "unruly" and a false
one like "uncle?" One would expect more difficulty with replication from
one analyst to another, exactly as Mr. Egan has shown.  But it hardly
follows that the first analyst, Hart, was wrong because the second one
used different, but equally plausible, conventions and got different
counts.  I don't have Hart's book in front of me, nor Mr. Egan's
counting rulebook, if there is one, but I would suppose that both of
them tried to minimize counting wobble the same way everyone else does
or should do, by trying to have all the counting done by the same person
at the same time in the same way.  That way, whatever quirks are applied
to one text are applied uniformly to others, and counting wobble, which
is seldom reducible to zero, can be reduced to the point where the
findings are still useful.

Nate McMurtray, a Claremont McKenna College student and captain of The
Claremont Shakespeare Clinic in 1994, produced a program, Textcruncher,
for counting real "un-" words by finding and counting every "un-" word
and subtracting it if it matched a list of false "un-" words like
"uncle" stored in the computer.   Like the different wordcounts you get
from Word's and from Word Perfect's counters, the results from his
program are thoroughly replicable from one test to another, but only
roughly match those of Hart and Egan.   Examples:  R2: Hart, 52; Egan,
61; Textcruncher, 61.   Tmp:  20;
20; 26.  Ham:  71; 80; 97.

In general, Egan's counts are higher than Hart's, ours higher than
Egan's, but these differences don't mean our counts are right and the
other two wrong, only that we used different counting conventions, and
perhaps that the list of false "un-" words I gave Nate should have been
longer.

Such discrepancies do mean that you should be careful about mixing
counts by different people, but not that one person's counts are bad
because another's are different.   We used our Textcruncher counts to
establish a Shakespeare profile, ranging from 28 to 65 per 20,000 words,
that being about the average length of a Shakespeare play.  For us, the
outermost counts were the most important -- 65 per 20,000 words for
Hamlet, 28 each for Much Ado About Nothing  and Henry IV, Part II - and
being outermost was what was important about them, for they defined our
profile.   We thought it was good enough evidence to show that every
Shakespeare play we tested was a Shakespeare "could be" by this test,
but that quite a few plays by others, such as Middleton's The Phoenix
and Marlowe's Dr. Faustus, with 4 and 14 true "un-" words per 20,000 by
our count, had far too few to be Shakespeare "could-be's."  My guess is
that, if Mr. Egan tested all four plays - Ado, 2H4, Phoe, and DF -- with
his counting conventions, he would get roughly the same relative
outcomes but with different absolute numbers.  If he wants to redo Hart,
since Hart is not around to do it himself, he is welcome to a copy of
Textcruncher (whose list of false "un-" words is also in principle
redoable more to Mr. Egan's taste).  It's fast and highly consistent.

Valenza and I are now working on several vocabulary richness tests which
say that Hart's most famous conclusion, that Shakespeare's vocabulary
was larger than anyone else's then or since, was dead wrong.  If these
hold up, we may one day be counted as Hart's chief critics.  But it
won't be because he counted Shakespeare wrong; it will be because he
didn't have as good ways to count everybody else as we do now, thanks to
computers and e-texts.  And it certainly won't be just because we get
different results with different counting conventions.  In no way will
it show that Hart's pioneering methodology was "not good enough for a
methodology claiming the authority and precision of a science."   That
would be a little like saying Copernicus had no claim to be called a
scientist because Kepler gave us a better account of the planets'
orbits, and a little like supposing that his pathbreaking discoveries
become worthless the minute someone comes up with something better, or
even with something different but not demonstrably better.  Hart and the
old Iron-man stylometricians were limited by the oxcart-level technology
of their time, but, retracing their steps with computers today, I am far
more impressed by their persistence, resourcefulness, and care in what
they did see than by the limits imposed on them by what they couldn't
see.  In my book, Hart was a great man and a brilliant, pioneering
counter, who did the best he could with what he had, whose findings are
still well worth attention today, and who hardly deserves Mr. Egan's
put-down.

Ward Elliott

_______________________________________________________________
S H A K S P E R: The Global Shakespeare Discussion List
Hardy M. Cook, 
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 
The S H A K S P E R Web Site <http://www.shaksper.net>

DISCLAIMER: Although SHAKSPER is a moderated discussion list, the
opinions expressed on it are the sole property of the poster, and the
editor assumes no responsibility for them.
 

©2011 Hardy Cook. All rights reserved.