The Shakespeare Conference: SHK 12.0728 Friday, 30 March 2001
From: Mike Jensen <
Date: Monday 26 Mar 2001 10:24:27 -0800
Subject: Re: Shakespearean Authorship Research
Comment: SHK 12.0699 Re: Shakespearean Authorship Research
Hi, Paul. Thanks for all the answers. One of them began,
> However my program does not compare words, it compares word catagories.
> Hence such things as spelling errors and rare words are minimised.
Why is this desirable? If Shakespeare, to cite a common example, tends
to spell the word *Oh* with the first letter only (O), your program
tends not to take this fact into account. Is that correct? If so, what
is the advantage of this when making your authorship comparisons?
Also about Dave Kathman's "Roses are red example" , cited by me, you
> if a poem had only those two lines they would (wrongly)
> be considered similar. However, if we assume a poem is 14 lines, there
> is much less likelyhood of being similar by chance.
Statistically true, but it does not seem that simple to me. I gave just
one example, but many others are possible. Rather than getting bogged
down in the specific, let's have a peek at the general. If in those 14
lines you find several other poetic conventions common at the time, you
will still get a low statistical number (meaning a high degree of
linguistic commonality). True? If so, your program may still suggest
common authorship, but might it not instead suggest two authors who used
About your comparison of Shakespeare with Shakespeare, Bacon with Bacon,
etc., Did you compare every Shakespearean sonnet with every other?
Every de Vere? Every Bacon? Then every poem by one author, with every
poem by the others, or did you just sample?
All the best,
S H A K S P E R: The Global Shakespeare Discussion List
Hardy M. Cook,
The S H A K S P E R Webpage <http://ws.bowiestate.edu>