The Shakespeare Conference: SHK 15.1195 Monday, 7 June 2004
Date: Saturday, 5 Jun 2004 10:12:14 -0500
Subject: A New Digital Shakespeare from Northwestern University
May I draw the attention of Shakespeareans everywhere to a new
electronic Shakespeare, which is accessible from the Northwestern
University Library at www.library.northwestern.edu/shakespeare.
The Nameless Shakespeare, as it is provisionally called, is the product
of collaboration between the Perseus Project at Tufts University and
Northwestern faculty and staff in Academic Technologies and the Library.
The project is very much a work in progress and will become part of
WordHoard, a larger project at Northwestern, which has received funding
from the Mellon Foundation.
The aim of the Nameless Shakespeare is to create a freely available text
that fully supports the query potential of the digital surrogate. The
text is derived from a scanned version of the Globe Shakespeare but has
been thoroughly revised to create a text that is standardized in its
spelling but reflects as closely as possible the prosodic and
morphological properties of the folio or quarto copy texts. The text is
tagged in a TEI-conformant manner, and in addition to its own citation
scheme it carries references to the Hinman TLN numbers. It is fully
lemmatized and has been parsed with the CLAWS part-of-speech tagger
developed at Lancaster University and used for the British National
Corpus. In the course of this summer we will add a level of semantic
tagging to this text, using the USAS tagger developed by Lancaster
The current interface for the Nameless Shakespeare is a stopgap measure
while we develop the new WordHoard interface, which will let users take
full advantage of this deeply tagged text. But clunky and inconsistent
as the current interface may be (especially in its delivery of complex
query results) it lets you do now what you cannot easily do through any
other site. You can, for instance, make a list of words spoken by
Ophelia in verse, or a list of words that occur only in Hamlet and Lear,
adjectives in the Comedy of Errors, and so forth.
At the moment the text of the Nameless Shakespeare will be accessible
only through the Northwestern interface. We expect to release the text
early in the fall of 2004 after we have added the level of semantic
tagging and corrected many remaining errors in the part-of-speech
tagging, especially in the assignment of grammatical words to such
categories as adverb, conjunction, or determiner.
We will be most interested in hearing from users of the Nameless
Shakespeare what they would like to see in the better interface we plan
to develop through WordHoard, and we will also be very grateful for
error reports. Automatic tagging of textual data has an error rate on
the order of 5%. Through manual corrections we have now reached a stage
where we believe the error rate hovers around 1%. But in a text of some
850,000 word occurrences that still means about 10,000 wrongly assigned
word occurrences. Error reports, even of individual errors, are very
useful in directing attention to systemic problems.
Professor of English and Classics
S H A K S P E R: The Global Shakespeare Discussion List
The S H A K S P E R Web Site <http://www.shaksper.net>
DISCLAIMER: Although SHAKSPER is a moderated discussion list, the
opinions expressed on it are the sole property of the poster, and the
editor assumes no responsibility for them.