The Shakespeare Conference: SHK 15.2101 Tuesday, 14 December 2004
Date: Monday, 13 Dec 2004 14:01:04 -0600
Subject: More about the Nameless Shakespeare
We have made some progress with The Nameless Shakespeare in the context
of our WordHoard project. You may now download the texts from
More usefully, you may order it from
http://www.natcorp.ox.ac.uk/babyinfo.html as part of XAIRA CD-ROM, which
includes XAIRA, an XML aware search engine, and the one million word
sample of the British National Corpus. Alternately, you could sign up as
a beta tester for XAIRA (http://www.oucs.ox.ac.uk/rts/xaira/) and
download the texts from our site.
I would also like to tell you about a new feature of the current
interface to the Nameless Shakespeare at www.library.northwestern.edu.
If you click on any line in the text, you are taken to a transcription
of the relevant column in the Folio text, with the hit line marked in
red. This means that for any reader of the modern text information about
the orthography and punctuation of the Folio is only a couple of seconds
away. The transcriptions come to us courtesy of the Internet Shakespeare
The Nameless Shakespeare is a TEI-encoded, lemmatized, and
morphosyntactically tagged text of the plays and poems of Shakespeare.
It is based on a thorough revision of the Globe Shakespeare. It is a
modern-spelling edition that tries to preserve the morphological and
prosodic features of the Folio and Quarto source texts. The header
document to the text files describes the editorial and tagging
procedures in some details.
The raw files of the Nameless Shakespeare are not meant to be 'human
readable' texts. Not much pleasure or wisdom can be got out of looking
at something like
<l part="N" id="sha-juc101001"><w wt="av" pos="av">Hence</w><c>!</c> <w
wt="n" m="sg" pos="n">home</w><c>,</c> <w wt="pnp" m="2pl"
pos="pnp">you</w> <w wt="aj" pos="aj">idle</w> <w le="creature" wt="n"
m="pl" pos="n">creatures</w> <w wt="v" m="pr" pos="v">get</w> <w
wt="pnp" m="2pl" pos="pnp">you</w> <w wt="av" pos="av">home</w><c>:</c>
which is the fully encoded first line of Julius Caesar. If you look
closely at this hideously verbose encoding you will notice that it
spells out in tedious detail some very primitive facts that every
minimally competent reader will bring to the task of decoding the words
on the page. This does little good in looking at the text word by word.
But with the right kind of search tool (such as XAIRA or the WordHoard
tools we are developing) this information can serve as the point of
departure for many stylistic inquiries.
The tagging of the Nameless Shakespeare was done automatically but went
through several rounds of manual error checking. I believe that there is
a residual error rate of 0.7%. This is virtually meaningless for any
quantitative inquiry. On the other hand, it means that in a play of
20,000 words something is wrong with about 150 tags. I would like to get
this error rate much closer to zero and will be grateful for any
corrections. There is an error report form at
If there volunteers who are attracted by the thought of chasing errors,
I can provide them with Excel files that show a "verticalized" form of
the text, in which you read downward row by row and see the
morphosyntactic tag next to the word with a special column for marking
an error. Errors discovered in one play can be automatically related to
the same errors occurring elsewhere. Thus a volunteer who finds 150
errors in one play is likely to correct 300-500 errors across the
corpus. This is a boring but painless and effective way of doing a
little philological good in the world. If you're interested, please
Professor of English and Classics
Department of English
Evanston, Illinois 60208
S H A K S P E R: The Global Shakespeare Discussion List
The S H A K S P E R Web Site <http://www.shaksper.net>
DISCLAIMER: Although SHAKSPER is a moderated discussion list, the
opinions expressed on it are the sole property of the poster, and the
editor assumes no responsibility for them.