The Shakespeare Conference: SHK 9.0126 Wednesday, 11 February 1998.
From: Don Foster <
Date: Monday, 09 Feb 1998 09:09:32 -0500
Subject: Foster on Shaxicon (Part Four of Six)
SHAXICON, PART 4, applications.
Note: this message is intended for SHAKSPEReans, and for the SHAKSPER
List archive; not for unauthorized circulation. Don Foster
Shaxicon provides information on the coinage of new words, the
importation of foreign words, and the dissemination or repetition of
words from one text to the next, or from one authorial canon to another.
It can be used to assist in the investigation of almost any problem
involving chronology, authorship, or intertextual borrowing, not just
matters of theatrical rehearsal.
Take, for example, Kent Hieatt's recent essay on dating Shakespeare's
Sonnets: A. Kent Hieatt, Charles W. Hieatt; and Anne Lake Prescott,
"When Did Shakespeare Write Sonnets 1609?" Studies in Philology 88.1
, 69-109. (The article appears to be chiefly a Hieatt affair; I
have not discussed the study with Anne Prescott, but I cannot see that
she is responsible for much of the writing in it.)
In 1990 Kent Hieatt wrote me with news that he and his brother Charles
had hit upon a novel method for dating Shakespeare's Sonnets: in a
painstaking procedure, they (mostly Kent, I believe) isolated all words
that appear in Son. and in at least three canonical texts later than
1600 but not before. The Hieatts designated these words "Late Rare
Words" or "LRWs." They further isolated all words that appear in Son.
and in at least three canonical texts earlier than 1600 but not after.
These they called "Early Rare Words." They then performed the same
procedure for a few extracts from The Rape of Lucrece, Richard II, and 2
The Hieatts discovered that the Sonnets by this procedure looked
distinctly later than Luc, R2, or 2H4: Son. has more LRWs than these
other texts. The Hieatts also discovered (1) that there are many more
ERWs than LRWs in Son.; and (2) that there are many long intervals in
Son. containing no LRWs at all. The Hieatts concluded on this evidence
that the Sonnets were drafted in the early 1590s (hence the
preponderance of ERWs) but revised shortly before publication (hence the
evident clustering of LRWs). This sounds like an eminently plausible
ERWs and LRWs represent terminology foreign to Shaxicon, but any
SHAKSPER member can use Shaxicon to examine the Hieatts' assumptions.
As it happens, there were a number of problems with the procedure.
First, the Hieatts tested only extracts from The Rape of Lucrece,
Richard II, and 2 Henry IV, without bothering to construct a full
cross-sample for any one of these texts, or a partial cross-sample for
the rest of the canon. Second, all plays and poems except Son. were
treated as "pure" (unrevised) texts, written at a single moment in
It was merely posited that the Son. were written and revised over a
period of years but the other Shakespearean texts were not. For
example, the Hieatts treat F1 Wives as an "early" (pre-1600) play,
though most scholars agree that the F1 version contains material later
than HAM. (There are many such problems with the Hieatts' adopted
chronology.) The Hieatts further assumed that all plays and poems
contribute about equally to the canonical lexicon, which is certainly
not true (e.g., VEN and LUC have a much richer vocabulary than TMP,
while LR has a much richer vocabulary than ERR). And the Hieatts'
principles of lemmatization were highly inconsistent (e.g., "widowed" in
"widowed wombs" SON 97.8 was counted as a verb, while "waned" in "waned
lips" (Ant. 2.1.21) was not. Hair-splitting distinctions were made
between words, and intervals between LRWs were not consistently
counted. As I observed in a series of letters to Kent in 1990-91, these
inconsistencies tended all in one direction, which made the study
vulnerable to an accusation of unconscious bias even though Kent clearly
wished for his work to be perfectly objective.
The advantage of a comprehensive and stable lexical database such as
Shaxicon is that it resists hair-splitting and convenient distinctions
between words, and in measuring distributions. As already noted, some
texts or portions thereof may be imperfectly dated (even in Shaxicon
1996). But insofar as a word has been poorly lemmatized or a text
inaccurately dated, that misplaced thread remains in place in the
simulacrum for all scholars considering all textual problems, until such
time as the error is corrected in a subsequent edition. There is no
possibility for the individual scholar to jerk around the data to fit a
particular hypothesis, or to introduce unconscious bias in
The Hieatts did not consciously jerrymander their data. Nevertheless,
their study was built on two plausible but hugely mistaken assumptions
that make these other, preceding, objections look trivial: it was
assumed by the Hieatts that ERWs and LRWs are about evenly distributed
in the Shakespeare canon, with a comparable number before and after
their 1600 dividing line. Shaxicon, even in its earliest stages,
revealed this assumption to be absolutely untrue: ERWs greatly
outnumber LRWs in the Shakespeare canon. This is counter-intuitive, of
course, but almost all Shakespeare texts include more ERWs than LRWs,
the earliest texts by a huge margin. The number of ERWs and LRWs
finally balances out in the last few years of Shakespeare's career.
In 1990-92, Kent Hieatt and I exchanged a series of letters about this
inconvenient problem. Kent at first vigorously and repeatedly denied
that such a lopsided distribution of ERWs and LRWs was even possible.
He proceeded with publication. During this exchange I remained on Kent's
side with respect to his dating for the Sonnets (and indeed, I still
stand in agreement with his major conclusion), but I felt that he should
account for the problem of lopsided ERW and LRW distribution. Toward
that end, I gave Kent
some of the early Shaxicon diskettes (unfinished) and let him
investigate the problem for himself. In letters dated 4/15 and 5/28
1992, Kent finally conceded the inevitable: he had assumed that ERWs and
LRWs were evenly distributed in the canon: since 1992, Kent method by
insisting that it is the distribution of ERWs and LRWs within SON that
support a hypothesis of early composition and late revision: for the
Hieatts were correct in finding long intervals in Son. that contain no
LRWs at all.
Users of Shaxicon may check this secondary hypothesis as well. In fact,
I did so, and gave Kent the results years ago. Unfortunately, Kent's
minor premise proved to be mistaken as well: by Kent's measure, SON
must be considered later than WT, for WT contains an even higher
proportion of long gaps without LRWs than does SON. Kent Hieatt's
response to this second problem (in a letter of 2 May 1994), was to
insist that the long gaps between late rare words in WT must be
differently interpreted from the long gaps in SON: the lateness of WT
provides Shakespeare with fewer opportunities to turn first-time words
into LRWs by their appearance again in at least two other late plays.
(No matter that WT has more opportunities to form LRWs with already-used
words in other post-1600 plays.) At this point, Kent's tone became so
testy, and the discussion so unproductive, that I thought it best to
withdraw from the correspondence.
Despite these problems, Shaxicon provides considerable support for the
Hieatts' view that the Sonnets as published in 1609 are later than LUC,
R2, and 2H4.
But what of my own work? Can Shaxicon be used to expose Foster's own
mistakes? Yes, certainly. Tomorrow morning, I will continue this long
discourse with a few examples of my own mistakes; and I shall respond to
Kent Hieatt's confusion with respect to the various kinds of
distributions charted in Shaxicon's Microsoft Excel distributions. I
will close with a brief report on the progress being made towards
Shaxicon's completion and distribution. But at the moment, I've gotta
get ready for classes.