Shakespeare Electronic Conference, Vol. 6, No. 0533. Thursday, 6 July 1995.
Date: Tuesday, 04 Jul 1995 11:40:16 +0100
Subject: Re: SHAXICON
To all of you good folks out there who have inquired lately about SHAXICON and
its availability, and about the roles that Shakespeare may have played:
Caveat: This will be a long post, but I'm trying to answer loads of queries in
one swoop (or, as Pogo used to say, "in one fell soup"):
First, what is it? SHAXICON is a lexical database that indexes all of the
words that appear in the canonical plays 12 times or less, including a
line-citation and speaking character for each occurrence of each word. (These
are called "rare words," though they are not rare in any absolute
sense--"family [n.]" and "real [ad.]" are rare words in Shakespeare.) All
rare-word variants are indexed as well, including the entire "bad" quartos of
H5, 2H6, 3H6, Ham, Shr, and Wiv; also the nondramatic works, canonical and
otherwise (Ven, Luc, PP, PhT, Son, LC, FE, the Will, "Shall I die," et. al.);
the additions to Mucedorus and The Spanish Tragedy, the Prologue to Merry Devil
of Edmonton, all of Edward III and Sir Thomas More (hands S and D); Ben
Jonson's Every Man in His Humour (both Q1 and F1) and Sejanus (F1); and more;
but these other texts have no effect on the 12-occurrence cutoff that sets the
parameters for SHAXICON's lexical universe.
What SHAXICON demonstrates is that the rare-words in Shakespearean texts are
not randomly distributed either diachronically or synchronically, but are
"mnemonically structured." Shakespeare's active lexicon as a writer was
systematically influenced by his reading, and by his apparent activities as a
stage-player. When writing, Shakespeare was measurably influenced by plays
then in production, and by particular stage-roles most of all. Most
significant is that, while writing, he disproportionately "remembers" the
rare-word lexicon of plays concurrently "in repertory"; and from these plays he
always registers disproportionate lexical recall (as a writer) of just one role
(or two or three smaller roles); and these remembered roles, it can now be
shown, are most probably those that Shakespeare himself drilled in stage
SHAXICON electronically maps Shakespeare's language so that we can now usually
tell which texts influence which other texts, and when. Moreover, when collated
with the OED or with early modern texts in a normalized machine-readable
format, SHAXICON provides an incomplete record of Shakespeare's apparent
reading. The main value of this resource has less to do with biographical
novelties, however, than with problems of textual transmission, dating,
probable authorship of revisions, early stage history, and the like. And
because SHAXICON is a closed system, human bias in measuring lexical influence
of this sort is effectively eliminated. The evidentiary value of supposed
"verbal parallels" is no longer a matter of private intuition or subjective
judgment, but quantifiable, using a stable lexical index (and measurable
against a virtually limitless cross-sample of machine-readble texts).
In 1991, I published a 3-part report in SNL about SHAXICON (the database was
not then completed, and not yet dubbed), in which I made (in a few cases,
mistaken) projections concerning Shakespeare's apparent stage roles (based on
entries for about a third of the final lexical sample). The few botched
projections derived in part from key-punching errors--e.g., "Pand" (Pandarus of
TRO) was often being entered for "CPan" (Pandulph of JN), and "QnElz" (R3) for
QnEliz (3H6); and in part from unavoidable limitations, explained in the SNL
series, concerning the variable "richness" of character-specific lexicons,
which could not be measured until the whole canon was indexed. These problems
have been eliminated.
The following list represents a corrected catalogue of those roles that
Shakespeare is most likely to have acted. These assignments vary somewhat in
statistical significance, depending on sample size, etc. A fuller report (with
instructions on how to run cross-checks and fully automated statistical
analysis) will appear in my "SHAXICON Notebook" (a written commentary that has
yet to be completed). In the meantime, here follows a list of Shakespeare's
most likely stage-roles, as statistically derived. Keep in mind that this
catalogue cannot be proven to represent historical actuality. SHAXICON handily
selects Adam of AYL and the Ghost of Ham as probable Shakespeare roles, both of
which are supported by hearsay evidence from the 17th century; the remaining
roles find no external historical confirmation (although Davies mentions that
Shakespeare played some kings, and SHAXICON indicates that Shakespeare played
king-roles in AWW, 1H4, 2H4, HAM, LLL, PER, and probably MAC). Having studied
the evidence from every conceivable angle, I'd say that the assignments below
are good bets, even despite the lack of archival evidence to back them up, for
the disproportion in Shakespeare's persistent recall of these roles is quite
striking relative to other roles in the corresponding texts. There are a few
texts (principally ADO, MV, and Jonson's EMI) in which Shakespeare may have
played two different roles in two successive seasons of the same theatrical
"run." But the statistical weight of Shakespeare's selective recall of
particular roles is in most instances pretty clear; in fact, when multiple
roles are identified by SHAXICON as probably Shakespearean, they are in most
instances roles that are easily doubled (exceptions and problems are are noted
MOST PROBABLE SHAKESPEARE ROLES, BASED ON THE POET'S PERSISTENT AND MEASURABLE
RECALL OF PARTICULAR CHARACTER-SPECIFIC LEXICONS:
ADO: Leonato; later switching to Friar (Q version registers higher lexical
recall for Leonato, F1 version higher for Friar. Could be viewed as a problem,
since the same actor cannot have played both roles simultaneously, yet
Shakespeare clearly "remembers" both roles (unlike all other principal parts in
ADO, which he "forgets").
ANT: Agrippa, Philo, Proculeius, Thidias, and Ventidius, probably
simultaneously [!] (thus requiring some accommodation at 3.2.1 for Vntd/Agri),
and probably with Proculeius taking Agrippa's lines in 5.1 (hence the textual
crux recently discussed on SHAKSPER).
AWW: King of France
AYL: Adam; adding old Corin the Shepherd in two revivals of AYL.
COR: Shakespeare role uncertain. Highest relative post-COR lexical "influence"
comes from Sicinius, but Sicinius-"influence" is tepid relative to the the
whopping excess in lexical recall that obtains for the designated Shakspeare
roles in most other plays.
CYM: 1.Gent (I.i), Philario (I.iv, II.iv), and Jupiter (V.iv)
EMI-F (Jonson): Very complicated. Looks as if F1 may represent a major
Elizabethan revision of Q1, followed by a minor Jacobean revision (as per
established textual scholarship on EMI). SHAXICON confirms that Shakespeare
probably knew the play in performance: in 1598, and again in 1604, words from
EMI come pouring into Shakespeare's writing, forming very distinct peaks of
lexical influence just when we know that EMI was, indeed, acted by the King's
Men (and again in 1612-13). But lexical influence by character (entirely
independent of general lexical overlap) gives mixed signals: Shakespeare has
extraordinarily high recall of two roles that cannot have been performed
simultaneously by the same player: Old Lorenzo-Knowell (esp. the F1 Old
Knowell), and Judge Clement (esp. the Q1 Clement); and indeed, these two roles
seem to alternate in their peaks of lexical "influence" on Shakespeare's
writing, which suggests that he may have alternated roles. (But Shakespearean
texts have also an irregularly high overlap with the Thorello-Kitely role both
before AND after 1598, which cannot be explained, except as a statistical
ERR: Egeon (I.i, V.i) and Dr. Pinch (IV.iv).
1H4: King Henry.
2H4: King Henry (and perhaps Rumor, but only briefly).
H5: Complicated: It looks as if Shakespeare played the French Messenger and
Exeter in the "bad"-Q version (in 1599, while also playing Exeter in a revival
of 1H6); in H5-F1, Shakespeare appears to have performed Bishop Ely and
Montjoy. But it looks also as if Shakespeare may sometimes have performed the
Chorus (less strongly marked, but still pronounced in its lexical influence on
late Shakespearean texts relative to other roles in the play). The Chorus-role
is easily doubled with Montjoy--but tripling with Ely raises a problem at
I.i.0, when the Chorus walks offstage and Ely walks on.
1H6: Exeter (in I.i, III.i, IV.i, V.i) and probably Mortimer (II.iv) in first
run and again in 1599; switching to Bedford in 1600 ff. after slight revisions,
principally in I.i. A problem: the same actor cannot easily play both Exeter
and Mortimer in the F1 version, given the Exeter entrance at III.i.0 following
the Mortimer exit at II.iv.212; so if SHAXICON's Exeter/Mortimer data are
correct, there has either been some material cut betweeen II.iv and III.i, or
else Shakespeare was one fast dude when changing his duds (switching from a
dead Mortimer to a living Exeter in just 8 lines).
2H6: Suffolk (also Suffolk in the "bad" 2H6-Q, which appears certainly to
antedate the F1 version, as has been argued by Steve Urkowitz).
3H6 Warwick (Old Clifford in the "bad" 3H6-Q, which appears certainly to
antedate F1 version, as has been argued by Steve Urkowitz).
H8: Prologue and 1.Gentleman; or none (statistically uncertain, due to
insufficient post-H8 lexical sample).
HAM: Ghost, 1.Player, Mess-Gent. of 4.5 (and perhaps also role in the
Mousetrap, most probably Lucianus; and probably not, as per SNL, the
player-king); Mess-Gent partly folded into Horatio role in F1 version.
JC: Shakespeare role(s) a little uncertain, due to apparent revision and
shortening. Most probably, Decius; and, somewhat less probably, Flavius.
Note: Decius-Flavius doubling is not possible in the F1 version unless F1 has
been shortened from an earlier version. In F1, at I.ii.0, Flavius and Decius
enter as mutes; but the very text of JC I.ii offers some evidence that the text
has, indeed, been shortened at this point (e.g., in the same scene, at
I.ii.285, Casca reports that "Murellus and Flavius, for pulling scarfs off
Caesar's images, are put to silence"; but, if we may believe the F1 stage
direction at I.ii.0, Casca was on stage with Murellus and Flavius moments
earlier--from I.ii.0 to at least I.ii.214--and Casca hasn't heard boo about
Caesar's images in the interim). SHAXICON thus seems to confirm the view that
JC-F1 is a shortened text (albeit with some added bits (e.g., the second
account of Portia's death, which are indexed in SHAXICON under JC-b). I am
inclined to accept the assignments of both Decius and Flavius to Shakespeare,
but there is room for doubt.
JN: Cardinal Pandulph.
LLL: Ferdinand (possibly with one brief stint as Boyet).
LR: Albany. The Albany role reduced in (revised) F1 version, one of several
designated Shakespeare roles that appears to have been cut or reduced ca. 1612;
doubtful that Albany was subsequently performed by Shakespeare.
MAC: Shakespeare's most probable roles in this equivocating play are Duncan,
Lord, and Scots Doctor, but I wouldn't bet the farm on it, for the evidence is
itself equivocal. That MAC was revised ca. 1612 seems altogether likely from
the evidence of SHAXICON (principally in I.v.1-30,. I.v.71-3, IV.iii all, and
V.ix.1-19; the Hecate material is independently indexed under MAC-c--III.v all;
IV.i.39-43, IV.i.125-32, date and provenance unclear). Simon Forman's
eye-witness account of MAC as acted in 1611 suggests that the ur-MAC had a
larger Duncan-role than in the F1 version. And it has recently been argued on
SHAKSPER that there was an Elizabethan MAC on which the 1606 version was based;
I find these theories of revision attractive, and wish that someone would prove
them true, since taken together they would provide a satisfactory explanation
for the irregularities in the SHAXICON data for MAC.
MND: prob. Theseus, but with very irregular figures, enormously high
Theseus-"influence" on the post-1594 poems, rather slight Thesus-"influence" on
the post-1594 plays (though still higher than for other MND characters).
MV: Somewhat conflicted results: almost certainly Antonio in all
productions; but Morocco is a second "remembered" role, especially as manifest
in the lexicon of the post-1594 poems and in the 1595-6 plays. Morocco tends to
register its strongest influence on Shakespeare's writing when Antonio doesn't,
and vice versa. No other role in the play comes close to these two parts in
lexical "influence" upon the poet's post-MV writing. Perhaps Shakespeare
alternated roles; he cannot easily have played both simultaneously, at least
not in the Q1 or F1 text.
OTH: Brabantio. The Brabantio role is reduced in the (acc. to SHAXICON,
revised) Q1 version; SHAXICON identifies a final "run" of OTH (1611-13), but it
is doubtful that Brabantio was performed by Shakespeare later than 1612.
PER: SHAXICON suggests that PER is a very early play (ur-PER), the palimpsest
of which is imperfectly represented by acts I-II of PER-Q. PER was clearly
revised in 1607 by Shakespeare (new or greatly re-written acts III-V). SHAXICON
offers no support for the view of the Oxford editors that PER-Q represents a
Wilkins-Shakespeare collaboration, yet it leaves open such a possibility
insofar as Wilkins could be shown to have tinkered some with acts I-II while
Shakespeare was rewriting all of acts III-V. (This could be tested by indexing
other texts by Wilkins.) Shakespeare appears to have acted both Antiochus and
(at least when doubling was needed) Simonides, and he may have performed or
read Gower's part from time to time, most notably ca. 1608/9 (cf. notes on
H5-F1, another script for which Shakespeare registers sporadically high recall
of the chorus-role, especially ca. 1608/9--perhaps the company was short-handed
in that year). Shakespeare probably performed Antiochus and Simonides both
before and after the 1607 revision, without taking on any wholly new or
additonal role after the new acts (III-V) replaced those in the the ur-PER.
R2: Gaunt (in I.i - I.iii, II.i), the Gardener (III.iv), the Lord (IV.i), and
probably also the Groom (V.v). Troublesome dating: SHAXICON seems to indicate
that R2 derives from an earlier play, and that R2 was revised immediately after
1H4 (but prior to publication of R2-Q1). This finding is at odds with all past
textual scholarship on the play, which has been nearly unanimous in viewing R2
as a text begun and completed ca. 1595.
R3: Clarence (in I.i, I.iv, and V.iii) and Scrivener (III.vi). Possibly also
Third Citizen (II.iii) in a late revival.
ROM: Chorus and Friar Lawrence (Chorus-role omitted in late revival, as per
SEJ (Jonson): Macro (I.i, II.iii, III.i, IV.ii); probably also (but less
well-marked) Sabinius (I.i, II.iii, III.i, IV.iii), with some accomodation for
a costume change after IV.ii (but Jonson reports in F1 that he has revised
Sejanus, which means that this problem at IV.iii.0 may not actually have come
up in the performed text).
SHR: Lord, and perhaps also Pedant.
TIM: Poet in TIM-a (representing ur-F1 version, the parts of TIM-F1
customarily ascribed to Shakespeare); no role apparent in TIM-b (widely
supposed to represent Middleton or late-Shakespearean revision; SHAXICON
suggests that TIM-F1 is a late, unfinished revision (ca. 1613) of a play first
acted in 1601. TIM-F1 appears not to be a collaborative text per se.
TIT: probably but not certainly Aaron (a role uncharacteristic of Shakespeare
and less strongly marked statistically than most other roles identified in this
TMP: no Shakespeare role apparent
TNK: no Shakespeare role apparent; insufficent post-TNK sample.
TNT: Antonio (later adding Valentine [I.i]).
TRO: perhaps none until 1609; then, Ulysses (a role that seems out of keeping
with the others designated by SHAXICON)
WIV: In WIV-F1, Ford, but only in two evidently brief runs. The Host in WIV-Q
(which, though a "bad" quarto, appears certainly to antedate the F1 version).
WT: Archidamus (I.i), Antigonus (II.i, II.iii, III.iii), and 3rd Gentleman
WHAT DO YOU NEED TO USE SHAXICON:
2. Disk space. In its present form, SHAXICON sucks up 40+ megs just for the raw
data, plus another 20 megs or so for the commentary, help files, and graphics;
plus another 20 megs or so for the software. But don't start erasing those
electronic games just yet in order to make room for it. The main database for
SHAXICON is now complete, purged of errors, and generally usable; but it's not
yet ready for prime time: SHAXICON now runs on ETC Word-Cruncher, which is
limited in its capabilities and requires way-too-much manual labor (keying in
lexical searches, etc.). We're now using Excel for the summary figures and
graphics, which is a big time-saver--but we're likely to change over, prior to
publication, to a slicker and more fully automated database-management system
so that SHAXICON is more user-friendly in ALL respects. I'm inquiring after
Oracle, 4D, and Fox. If anyone out there has suggestions, I'd be obliged to
In advance of publication we're drawing on the expertise of people in various
fields so that when it's finally distributed SHAXICON will be fully
intelligible even to those users without expertise in computers, statistics,
and/or textual scholarship. I'm shooting for 1996 publication, but cannot
guess what technical problems may arise in the interim. CD-rom may be too slow
to be practicable, but disk-space may otherwise be a problem for many users.
I am eager to familiarize other scholars with SHAXICON, and will be available
next year to give a talk or seminar if there are interested parties in your
department. Next week I'll be in Santa Barbara, where I'll be presenting
SHAXICON at the ACH/ALLC conference. Hope to see you there.
Thanks for your interest.