The Shakespeare Conference: SHK 12.0699 Saturday, 24 March 2001
[1] From: Gabriel Egan <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Wed, 21 Mar 2001 14:00:32 -0000
Subj: Re: SHK 12.0663 Shakespearean Authorship Research
[2] From: Paul Maddox <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Wednesday, 21 Mar 2001 15:34:18 +0000 (GMT)
Subj: Re: SHK 12.0682 Re: Shakespearean Authorship Research
[3] From: Takashi Kozuka <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Wednesday, 21 Mar 2001 21:14:53
Subj: Re: Shakespearean Authorship Research
[4] From: Marcus Dahl <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Thursday, 22 Mar 2001 11:27:33 EST
Subj: Re: SHK 12.0682 Re: Shakespearean Authorship Research
[1]-----------------------------------------------------------------
From: Gabriel Egan <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Wed, 21 Mar 2001 14:00:32 -0000
Subject: 12.0663 Shakespearean Authorship Research
Comment: Re: SHK 12.0663 Shakespearean Authorship Research
Paul Maddox wrote
>My program has three stages:
>1) Tag document(s) with their syntactic tags.
>There are about 40 tags for different syntactic catagories.
>Eg. "dog" is tagged as "NN", 'noun, singular, mass'.
"Noun" and "singular" I understand, but what do "NN" and "mass" denote
regarding the word "dog"? (Sorry if this is standard stylometric
terminology--hopefully my ignorance is shared by others on the list so
an answer will be worth distributing.)
>2) Comparing sets of tagged documents.
>There are two columns allowing a many-to-many comparisons.
>Eg. [Every 'Shakespeare' document] compared to [Every Marlowe
>Document]
>
>3) Rank the comparison data in order of similarity.
>The comparison engine collapses data down into a single scalar value.
>Eg. "macbeth.txt" compared to "EdwardII.txt" = 0.0345
>
>And that's about it. I hope I've not lost anyone. The basic outcome is
>that my program can rank document pairs on relative similarity.
Lost me, I'm afraid. What aspects of the two documents are being
compared? Assuming both are in ASCII coding, they are similar in not
containing Persian letter forms, but that doesn't tell us much. Could
you state the nature of the comparisons you are making?
Gabriel Egan
[2]-------------------------------------------------------------
From: Paul Maddox <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Wednesday, 21 Mar 2001 15:34:18 +0000 (GMT)
Subject: 12.0682 Re: Shakespearean Authorship Research
Comment: Re: SHK 12.0682 Re: Shakespearean Authorship Research
Dear All,
I'm not exactly sure how it is normal to reply to messages. As it's nice
and easy, I'll reply to them in a single email.
-------------------------------------------------------------
> From: Hardy M. Cook <This email address is being protected from spambots. You need JavaScript enabled to view it.>
>
> Perhaps, an explanation is called for. I got back late Monday from an
> out-of-state funeral and have had a head cold for several days. Indeed,
> I discussions of authorship are not permitted on SHAKSPER. However, at
> the time, Paul's posting appeared to describe a methodology rather than
> to argue for a candidate. For that reason, I let the posting go.
My apologies for posting about authorship, I wasn't aware it was against
the rules. If it is ok, I would still like to discuss the method that I
use.
-------------------------------------------------------------
> From: Mike Jensen <This email address is being protected from spambots. You need JavaScript enabled to view it.>
>
> Perhaps you did lose me, or perhaps something was missing in your
> explanation. If I understand you correctly, your program ranks
> documents by different authors according to which are most to least
> alike, is that correct? If not, please correct me.
My program ranks on similarity. The more similar, the lower the value.
> Please also explain to me the value of doing this. To use an example
As a computer scientist I believe in concrete figures and methods that
can be proven. I elected to do this project as a numerical method of
language comparison.
I cannot prove that the method is perfect, however, I have had success
with modern English examples, hence it does work to a certain degree.
If you were to compare every Shakespearean sonnet to every other (11781
comparisons) by hand I suspect it will take you a great deal of time.
Although far less accurate, my program can make those comparisons
(however naivly) in minutes.
> from Dave Kathman's superb Shakespeare Authorship web site, two poems
> today that begin
>
> Roses are red
> Violets are blue
>
> do not indicate single authorship. Just as this is a poetic convention,
Very true, and if a poem had only those two lines they would (wrongly)
be considered similar. However, if we assume a poem is 14 lines, there
is much less likelyhood of being similar by chance.
> Shakespeare's plays, one of the least likely of those tested, so I don't
> grasp what you are trying to accomplish.
I'm trying to see if my program is able to prove anything about
Shakespearean authorship. From a personal perspective I'm not concerned
by who wrote Shakespeare or if it is even a legitimate question to ask.
I'm investigating Shakespeare because it seemed like a reasonable test
for my program.
> I note that Oxfordians can't be comforted by your numbers, or they have
> to also have to accept all those rare word, spelling and other tests
> mentioned above.
However my program does not compare words, it compares word catagories.
Hence such things as spelling errors and rare words are minimised.
Further to this, my program was purposefully written to accept a certain
amount of noise without being highly detrimental to conclusions made.
> Also, 0.436 -> 2.021 and 11.62 and 219.2 didn't communicate much to me.
> Yes, these are the statistics of similarity, but what are the statistics
> of dissimilarity? Are these markers VERY similar, or only more similar
> than the other poems compared, but not really all that similar when you
> get down to it?
I agree the numbers are a little bit ambiguous, this is mainly due to
the method that I employ to turn a vector into a scalar. The overall
scope of the first pair of numbers is between 0.0 (identical) and 5.0.
The problem is that the values are not on a set range, which is why they
are only really comparable to other sets of results. I admit this is
somewhat problematic.
> Please don't feel attacked. I probably have missed something crucial in
> you explanation. Will you please take a moment to help me see how I am
> missing the point?
I don't feel attacked, if anything queries such as this help me. I'm
sure I'm not completely right myself, so I hope we have met at some
middle ground?
-------------------------------------------------------------
> From: Sean Lawrence <This email address is being protected from spambots. You need JavaScript enabled to view it.>
>
> I'm surprised that this has been posted, since usually we don't discuss
I think I'm counting my lucky stars on that one. :-)
> the so-called 'authorship question', but I'm wondering whether, as a
> control, you're comparing Shakespeare's sonnets to one another. This
> would provide a benchmark to show what work by the same author ought to
> look like. You could then tell whether Edward de Vere's sonnets look as
> much like Shakespeare's, as Shakespeare's look like Shakespeare's.
Since my original message I have indeed tried this. I have also taken
the data and plotted a normal distribution of the results, this is more
helpful when comparing multiple 'runs' of the program.
Please see below for my results.
-------------------------------------------------------------
> From: John E. Perry <This email address is being protected from spambots. You need JavaScript enabled to view it.>
>
> I didn't see any attempt to compare known Shakespeare works with other
> known Shakespeare works, known De Vere works with other De vere works,
Ahh, the fundamental flaw in my original experiment!
My results are as so:
Mean Std Dev
Bacon vs Bacon 0.922 0.149
de Vere vs de Vere 0.997 0.190
Shakespeare vs Shakespeare 0.963 0.199
Shakespeare vs Bacon 1.035 0.208
Shakespeare vs de Vere 1.050 0.198
Note. Results are ranked in Mean order.
The results seem exceptionally inconclusive. Both the results for Bacon
and de Vere are more different than Shakespeare vs Shakespeare, however
both are pretty similar.
-------------------------------------------------------------
> From: Peter Groves <This email address is being protected from spambots. You need JavaScript enabled to view it.>
>
> Similarity of what? It's not similarity of syntactic patterns, since
> there seems to be no attempt (as you've explained it) to parse the
> strings of labels into structures.
Firstly my program tags every word in a document with its syntactic
catagory. For instance:
FROM/IN fairest/JJS creatures/NNS we/PRP desire/VBP increase/NN ,/,
My program can then compare documents in two different ways:
1) First-order statistics (simply counts the number of each tag in the
doc)
2) Bi-gram statistics (works out the probability of jumping from one tag
to an adjacent tag on its right)
I take it by 'structures' you're refering to parse trees? It is an
interesting idea to compare parse trees, however somewhat out of the
scope of my dissertation.
> But more to the point, why bother?
I've chosen to use Shakespeare to test my program because the documents
are easily available. I wasn't aware it was such serious issue.
> It's like researching the possibility that the earth really IS flat
> after all. Only cranks and conspiracy theorists entertain the notion
> that Shakespeare's plays were written by someone else (whether or not of
> the same name). The whole topic is rightly banished from a serious list
> such as this.
I suspect a long time ago it was only the 'cranks' and 'conspiracy
theorists' that believed the earth was round. Sometimes you can't always
believe what you're told.
-------------------------------------------------------------
Thanks for everyone's time.
Paul
[3]-------------------------------------------------------------
From: Takashi Kozuka <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Wednesday, 21 Mar 2001 21:14:53
Subject: Re: Shakespearean Authorship Research
I must admit -- I feel like Alice...
"Somewhat it seems to fill my head with ideas -- only I don't exactly
know what they are!" -- Alice, Alice in Wonderland
Takashi Kozuka
(Another PhD student -- well, I thought I was...)
[4]-------------------------------------------------------------
From: Marcus Dahl <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Date: Thursday, 22 Mar 2001 11:27:33 EST
Subject: 12.0682 Re: Shakespearean Authorship Research
Comment: Re: SHK 12.0682 Re: Shakespearean Authorship Research
RE: Authorship etc
I'm not sure if this will be posted (bans etc) but I do have a general
response to Paul's question: are questions of authorship within the
canon and comparing the canon with other canons acceptable to this list?
Given my own interest in attribution studies and literary notions of
authorial fingerprints and methodology, (e.g. comparing H.C Hart's
editions of the tetralogy to modern Cambridge, Oxford or Arden editions
) I wonder whether Paul's question/ research is legitimately banned as
referring to the Authorship question?
Would no-one on this list be interested to know which bits (if
any...etc...etc..) of Cardenio, Contention, Timon, Titus, Sir Thomas
Moore, Two Noble KinsMen etc etc were written by Shakespeare? And
wouldn't you like to know how objective the methodology for these
attributions could be? I thought the comments directed to Paul advising
him to compare the S sonnets with the S sonnets before appraising how
similar they were to those of De Vere were good, but it strikes me that
this kind of advice could be ably given to most literary thinkers - i.e.
know your field - its limits and methodologies: e.g. when we find a
correspondance between the style of say the Jack Cade scenes in 2HVI or
the crowd scenes in Sir Thomas Moore how much does this tell us about
the Style of Shakespeare? What if the Jack Cade scenes were written by
Lodge? Does this mean Lodge also wrote Hand D in Sir Thomas Moore? Or
what about Greene's Groatsworth of Witte so often quoted in regard to
3HVI but rarely examined to see if it was actually Greene's work. As
several commentators have observed, the style and purpose had more in
common with Nashe and with the purse of Chettle than Greene...but
because the emphasis is always on Shakespeare's assumed authorship of
3HVI etc the subtleties of the issue are overlooked. Yet the authorship
of both texts is not uninteresting or unrelated. Arguments concerning
S's interpretation of history, social politics or his possible revision
of his own plays, playwriting ethics/practise/chronology are all
affected by the 'authorship question' regarding who wrote 3HVI or
Groatsworth, when and with what purpose. Moreover these are questions
that ought to have an answer which is material and objective. (Not to
mention the problems presented to ideas of precise and clear-cut
Shakespearean authorship by all those C16th collaborative plays...)
Perhaps Paul's sort of enquiries and research could be directed into
more fruitful areas (e.g. is grammar universal between authors, do rare
word tests tell us more than common word tests, can we really use
electronic texts for research concerning metre, punctuation etc etc) but
we should not overlook a few of the smaller planks in the eyes of the
wider Shakespeare fraternity.
Cheers,
Marcus.
S H A K S P E R: The Global Shakespeare Discussion List
Hardy M. Cook, This email address is being protected from spambots. You need JavaScript enabled to view it.
The S H A K S P E R Webpage <http://ws.bowiestate.edu>