August Seminar on E-Texts at Princeton

Subject: Humanities Computing Summer Seminar
Electronic Texts in the Humanities: Methods and Tools
August 9-21, 1992
Summer Seminar, Princeton University, New Jersey
Co-sponsored by the
Centre for Computing in the Humanities, University of Toronto
This first Summer Seminar of the Center for Electronic Texts in the
Humanities (CETH) will address a wide range of challenges and
opportunities that electronic texts and software offer to teachers and
scholars in the humanities.  Discussions on text creation, markup,
retrieval, presentation, and analysis will prepare the participant for
extensive hands-on experience with illustrative software packages,
such as MTAS, Micro-OCP, WordCruncher, Tact, Collate, Beowulf
Workstation, Perseus, and CD-Word.  Systems of markup, from ad hoc
schemes to the systematic approach of the Text Encoding Initiative,
will be surveyed and considered.  The focus of the Seminar will be
practical and methodological, concerned with the demonstrable benefits
of using electronic texts in teaching and research, the typical
problems one encounters and how to solve them, and the ways in which
software fits or can be adapted to methods common amongst the
humanities.  Participants will be given the opportunity to work on a
coherent project.  Those with projects already in progress or
preparation will be encouraged to bring them; texts and exercises will
be provided for those without a specific project in mind.
The seminar is intended for researchers, librarians and computer
center advisers who have basic computing experience, but little or no
experience of computers in a humanities research environment.  The
number of participants will be limited to 26.
Week 1, August 9-14, 1992
Sunday, August 9.  Registration
Monday, August 10. The electronic text
       a.m.  What is an electronic text and where to find them; survey of
             existing inventories, archives, and other current resources.
             History of computer-assisted text analysis in the
             humanities.  Introduction to simple concordancing with MTAS,
             including practical session.
       p.m.  Creating and capturing texts in electronic form; keyboard
             entry vs. optical scanning.   Demonstration of optical
             character-recognition technology.  Introduction to text
             encoding, surveying ad hoc methods, e.g. COCOA,
             WordCruncher, TLG beta code; problems of these methods.
             Systematic approach of the Text Encoding Initiative.
             Practical exercise in deciding what to encode in typical
Tuesday, August 11. Concordancing
       a.m.  A focussed look at computer-assisted concordance generation;
             types of concordances, their specific advantages and
             disadvantages.  Alphabetization, character sequences,
             sorting, and forms of presentation.  Introduction to
             Micro-OCP; practical session in its use.
       p.m.  Further work on concordancing with Micro-OCP.
Wednesday, August 12. The interactive concordance
       a.m.  Indexed, interactive retrieval vs. batch concordance
             generation.  Textual problems particularly suitable to an
             interactive system; the continuing use of concordances in
             hardcopy.  Preparation of text for indexed retrieval;
             differing roles of markup and external "rules"; kinds of
             displays; post-processing of displays.  Introduction to
       p.m.  Practical work using Tact: simple markup, compilation of a
             textual database, and methods of inquiry.
Thursday, August 13. Stylistics
       a.m.  Stylistic comparisons and authorship studies using
             concordance tools; basic statistics for lexical and
             stylistic analysis.  Case studies, e.g. Federalist Papers,
             Kenny on Aristotle, Burrows on Jane Austen.
       p.m.  Practical session using Micro-OCP and/or Tact for stylistic
Friday, August 14. Critical editions
       a.m.  Overview of tools for preparing critical editions.
             Constructing glossaries and material for commentary;
             application of Micro-OCP and/or Tact.
       p.m.  Collation; single-text vs. multiple-text methods.  Overview
             of software tools.  Introduction to Collate.
Week 2, August 17-20, 1992
Monday, August 17. Text analysis
       a.m.  Review of the previous week's work.  Discussion on the
             limitations of existing software.  Advanced analytical tools
             not commonly available, e.g. pattern recognizers,
             lemmatization systems, morphological analyzers, parsers;
             overview of these.
       p.m.  Simple, practical morphological analysis and lemmatization
             with Micro-OCP and/or Tact.
Tuesday, August 18. Developing and Extending Current Resources
       a.m.  How far do existing textual databases and software go
             towards satisfying the needs of teachers and scholars, e.g
             WordCruncher (ETC) texts, Oxford Electronic Texts, the
             Thesaurus Linguae Graecae (TLG), the ARTFL database, the
             Dante Database?  How these are accessed and used.
       p.m.  The electronic dictionary; from machine-readable dictionary
             to computational lexicon.  What the New OED and other online
             dictionaries can do for the scholar.  Uses of lexical
             knowledge bases in text retrieval.  Building a simple online
             lexicon with Tact.
Wednesday, August 19. Hypertext
       a.m.  Hypertext and hypermedia: alternative or complementary
             approaches to text analysis and presentation? Overview of
             some ongoing hypertextual projects in the humanities:
             Beowulf Workstation, Perseus, CD-Word.  What essential role
             does hypertext play in these?  How might hypertext and
             concordancing methods be combined?
       p.m.  Practical session in building a hypertextual system, using
             HyperCard or Guide.  A brief look at Annota.
Thursday, August 20. Projects (1)
       a.m.  Illustration of how to tackle projects using one of the
             methods covered earlier in the seminar; beginning of
             practical work.
       a.m.  Practical work continued.
Friday, August 21. Projects (2)
       a.m.  Practical work continued.
       p.m.  Concluding discussion of methodologies and problems.  Do the
             results justify the amount of work involved? How is one's
             perspective on text changed by using automatic methods?
             What can one learn from the collision of these methods with
             intuitive perceptions? How can the machine better assist the
             educated imagination?
The Center for Electronic Texts in the Humanities was established in
October 1991 by Rutgers and Princeton Universities with external
support from the Mellon Foundation and the National Endowment for the
Humanities.  It is intended to become a national focus of interest in
the U.S. for those who are involved in the creation, dissemination and
use of electronic texts in the humanities, and it will act as a
national node on an international network of centers and projects
which are actively involved in the handling of electronic texts.
Developed from the international inventory of machine-readable texts
which was begun at Rutgers in 1983 and is held on RLIN, the Center is
now reviewing the records in the inventory and continues to catalog
new texts.  The acquisition and dissemination of text files to the
community is another important activity, concentrating on a selection
of good quality texts which can be made available over Internet with
suitable retrieval software and with appropriate copyright permission.
The Center also acts as a clearinghouse on information related to
electronic texts, directing enquirers to other sources of information.
The seminar will be taught by Willard McCarty and Susan Hockey, with
assistance from Hannah Kaufman, Toby Paff and Mary Sproule.
Willard McCarty has been active in humanities computing since 1977.
With its founding Director, Ian Lancashire, he helped to set up the
Centre for Computing in the Humanities, University of Toronto, of
which he is now the Assistant Director.  He was the founding editor of
Humanist, the principal electronic seminar for computing humanists,
and has edited several other publications in the field. He regularly
gives talks, papers, and lectures throughout North America and Europe.
McCarty took his Ph.D. in English literature in 1984; his current
literary research is in classical studies, especially the
_Metamorphoses_ of Ovid. In support of a forthcoming book, he has an
electronic edition of that poem underway for the text-retrieval
program Tact.
Susan Hockey is Director of the Center for Electronic Texts in the
Humanities.  Before moving to the USA in October 1991, she spent 16
years at Oxford University Computing Service where her most recent
position was Director of the Computers in Teaching Initiative Centre
for Textual Studies.  At Oxford she was responsible for various
humanities computing projects including the development of the Oxford
Concordance Program (OCP), an academic typesetting service for British
universities, and OCR scanning.  She has taught courses on humanities
computing for fifteen years and has given numerous guest lectures on
various aspects of computing in the humanities.  She is the author of
three books and numerous articles on humanities computing and has been
Chair of the  Association for Literary and Linguistic Computing since
1984.  She is a member (currently Chair) of the Steering Committee of
the Text Encoding Initiative.
Hannah Kaufman, Toby Paff and Mary Sproule are all on the staff of
Computing and Information Technology's Information Services at
Princeton University.  Each of them has worked extensively with
humanities scholars.  Hannah Kaufman's special skills include the
design and use of full text and bibliographic databases; Toby Paff has
worked on designing fonts and analyzing non-Roman texts; and Mary
Sproule has extensive experience with critical editions and
instructional technology.
The seminar will include visiting talks in the evenings on specific
topics or research projects, as well as the role of the library in the
use of electronic texts.
The cost of participating in this Summer Seminar will be $850,
including tuition, meals and lodging at Princeton for the two weeks.
Students pay a reduced rate of $750.  Tuition, lunch and dinner only
will be $650.
Application Procedure
To apply for participation in this Summer Seminar, submit a statement
of interest of no longer than one page, indicating how participating
in the Seminar will affect your teaching, research or support, and
possibly that of your colleagues, in Humanities Computing in the
coming year.  Applications must be attached to a cover sheet
containing name, position, affiliation, postal and email addresses,
and phone and fax numbers, as available, as well as natural language
interest and computing experience.  Students must also include a
photocopy of a valid student ID.  The statement must be received by
the reviewing committee, consisting of members of the Center's
Governing Board, by May 15, 1992, at the address below.  Those who
have been selected to attend will be notified by June 1, 1992.
Payment will be requested at this time.
Summer Seminar 1992
Center for Electronic Texts in the Humanities
169 College Avenue
New Brunswick, NJ 08903
phone:  (908) 932-1384
fax:    (908) 932-1386
email:  ceth@zodiac (bitnet)
        This email address is being protected from spambots. You need JavaScript enabled to view it. (internet)

Tagging Electronic Shakespeare Texts

From: Luc Borot
Subject: TAGGING
Subject: 	TAGGING
Dear fellow SHAKSPEReans,
This note is the first Montpellier contribution to the discussion
on tagging on SHAKSPER. Number 42 of *Cahiers Elisabethains* (Oct.92)
will include a review by Patricia Dorval and myself of several e-edit-
ions of Shakespeare and Milton, including those on our server, and of
Michael Best's MARVELLOUSissima (sorry for the coinage) Hypercard stack
*Shakespeare's Life and Times*, plus the captivating and hyper-simple
concordance programme *Gconc*.
The research I am currently on for the Hobbes seminar of the CNRS in
Paris concerns reason of state in the 17th century. I have scanned the
text of two aphoristical treatises by the machiavellian republican of
the 1650-s James Harrington and I have applied Gconc to them. The tag-
ging I used for these 2 brief texts concerned only the text's subdivi-
Gconc can endure many different kinds of tagging for concordance-gener-
ation if the user takes good care of the parameters he enters in the
option-windows. I am just only discovering the problems of this type
of textual studies and editions, but my feeling is that we should wonder
for what use we are tagging the texts for ourselves or for our col-
leagues on the network. Are we tagging for concordance analysis, for
later treatment with a word-processor (and here the typographical data
such as italics and caps are necessary if we want our collaborators to
recover the presentation of the original. In many cases (working from
the 2 versions of the sonnets posted by Hardy Cook, The Wells-Taylor
Electronic Shakespeare by OUP, the Milton published by Shakespeare on
Disk and my own scanned texts) I have realised that concordances could
be used with more efficiency in a modern spelling edition (Pr Jean Fu-
zier, editor and translator of the *Sonnets* appreciated Hardy Cook's editions
and stresses that if a philologist wanted to compare the uses of than
and then in this work, he'd have to resort to a modern-spelling edition
though he personally preferred to use an old-spelling one to analyse
prosody and rhetoric).
Ken's tagging reminded me of the RTF (rich text format) code introduced
by Mac word processors in a specific recording format. The trouble is
that it will always hinder the work of the concordance software if the
latter is not formatted to cater for this type of code.
That's all for the moment. As our work on the question progresses I'll
let you hear more about our discoveries and silly blunders.
Fare ye well.
Luc BOROT in Montpellier (France, not Vermont!...)
<This email address is being protected from spambots. You need JavaScript enabled to view it.>

Modern PD Shakespeare Texts on the Network

Subject: 	FYI -- Shakespeare on-line
FYI -- just saw this in the VNEWS.
Best wishes.
A few days ago, Grady Ward (This email address is being protected from spambots. You need JavaScript enabled to view it.) posted an offer to send The
Unabridged Shakespeare to anyone who could send him disks or $10. Well,
I sent mine, and I've gotten the Works back already. Thanks, Grady.
Just 'cause Shakespeare is such a keen guy, I've put everything that he
did (that I have) up for anonymous ftp on
	terminator.cc.umich.edu []
Except for the poetry, I've split the works into directories and files
(there were just too many sonnets for me to go all the way).
Everything seems to be pretty religiously tabbed, so it's probably
going to be easy to do some nice formatting.
Anyway, enjoy,

Rs: Authorship, Performance Reviews

	Subj:  	RE: SHK 3.0069  Oxford and the Authorship Question
	Subj: 	Review of Stratford, ONT *Othello*
	Subj: 	RE: SHK 3.0070  Q: Performance Reviews
Subject: 3.0069 Oxford and the Authorship Question
Comment:  	RE: SHK 3.0069  Oxford and the Authorship Question
> Perhaps the real answer is that they
> are all by the recently, very recently, late Robert Maxwell;
> or perhaps Richard Ingrams wrote them with the help of
> Ian Hyslop.  I'm sure I could work out a case for any of
> these if I had the time.
> William Proctor Williams         TB0wpw1@NIU
I find myself in two minds over this ... as someone lucky enough to have lived
within theatre-going distance of the RSC at Stratford and having seen some
memorable performances/performers there over the past 25/30 years I have never
been too bothered about who wrote the words - be it a local yocal or a courtly
knob, the event of attendance and 'being there' was more than enough. However
age and study in other subjects has made me aware that exactly who wrote the
words is important on some (different perhaps) levels - so yes ... it is
important that some do not just accept the obvious (?) and tag them all as
Will c/o Stratford.  It is difficult enough to prove the facts behind what
happened yesterday to ever assume that we know what happened 3/400 years ago.
BUT I would love to see a proof that Ian Hyslop helped Richard Ingrams write
them... hell! I was watching them almost before he was born! Come on William
Proctor Williams - show me.
Simon Rae	    -	    User Services Officer,
Academic Computing Service, The Open University,
    Walton Hall, MILTON KEYNES, MK7 6AA, UK.
phone: 0908 652413	  		fax: 0908 653744
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.  - JANET  (UK)
		This email address is being protected from spambots. You need JavaScript enabled to view it.  - World
Subject: Review of Stratford, ONT *Othello*
Luis Gamez will find a review of the Stratford, ONT *Othello* embedded
in a review of the whole Stratford '87 season by Daniel J. Watermeier in
*SQ* 39, no. 2 (Summer 1988), pp. 229-30.
                               Ron Macdonald <This email address is being protected from spambots. You need JavaScript enabled to view it.>
Subject: 3.0070 Q: Performance Reviews
Comment: 	RE: SHK 3.0070  Q: Performance Reviews
For Lou Gamez, seeking reviews of 1987 Stratford (Ont.) *Othello* and London
*Coriolanus*: get in touch with June Schlueter, who is (I think) a member of
this SHAKSPER network, and who is also co-editor of *Shakespeare Bulletin*.
Hardly a production of ANYTHING Shakespearean is missed by that journal.
I'll bet you'll find at least one review of both plays there. Better yet,
subscribe! The *pony-express* (vs. e-mail) address is Department of English,
Lafayette College, Easton, PA 18042.

Tagging Electronic Texts

Subject: A Long Discussion of Tagging
WordCruncher, TACT, and Micro OCP are all text-analysis software.
WordCruncher and TACT both bill themselves as text-retrieval programs.
Micro OCP "makes concordances, indexes, and word lists from texts in
a variety of languages and alphabets."  On the simplest level, all
three locate words and word patterns in a text or corpus, but far
more sophisticated analysis is possible. Micro OCP, according to the
*User Manual*, "can be used for many text analysis applications
including the investigation of style, vocabulary distribution,
grammatical forms, rhyme schemes, text editing, and language
acquisition and teaching."
I am most familiar with WordCrucher, and more familiar with TACT than
Micro OCP.  However, I have not prepared texts to be used with these
programs, relying instead on previously prepared texts: *The Riverside
Shakespeare* with WordCruncher, *Hamlet In-TACT* (the three *Hamlet*
texts that Ken Steele prepared and shared with participants in a SAA
seminar on Shakespearean computing several years ago), and the Oxford
Text Archive's collections offered to SHAKSPEReans last fall with Micro
I don't mean this posting to be a tutorial in a subject about which I
know very little myself, but some details are necessary to continue our
discussion. Files prepared for WordCrucher seem to support "three levels
of Reference Codes that indentify three levels of a standard outline."
TACT, according to *User's Guide*, employs <angle brackets> to identify
Text References as does Micro OCP: herein lies the issue of tagging our
PD Shakespeare texts.
With the Sonnets, I chose to post two versions: one with minimal tagging
(basically only the <it> tag, which I am now inclined to replace with
{braces}) and another fully tagged version with <angle brackets> that
include T: Title, L: Line, P: Pagination, and S: Sonnet Number.  The
minimally tagged version could easily be reformated for TeX, while
either can be used with a word processor.  However, I suspect some
of us who want to work with these e-texts will want to use them with
one of the text-analysis programs.
Selecting what to tag in the sonnets was relatively easy; selecting
what to tag in the plays and deciding whether we would like to have
untagged as well as tagged versions is another matter.
Several months ago, Ken Steele proposed the following as possibilities
for tagging the plays:
	Play Title eg. <T Hamlet Q1>
	Act/Scene  eg. <A 3.2>
	Line [this should probably be added mechanically]
	Direction eg. <D Enter {Hamlet}.>
	Speech Prefix eg. <S {Ham.}>
	Font eg. {these words in italic}
	Language eg. <L Latin> ergo <L English>
	Some other things which might be added by an editor with sufficient
	resources and information are the following:
	Speaker (not always clear or the same as prefix)
	Verse/Prose (not always the same as the original ed.)
	Compositor Stints (usually a little theoretical)
With the sonnets, I tried to reproduce in ASCII as closely as possible
what I saw on the page.  Thus, I added spaces where I saw large gaps and
ran other elements together where I saw minimal spaces.  Because I was
trying to reproduce the "look" of the page, I included everything on the
page, including signatures, pages, and forms.  Thus, I would suggest that
pagination information be included in our tagging.
The Oxford Text Archives describes its encoding choices this way:
This file contains embedded markers for use by Oxford Concordance Program,
delimited by the characaters < and >
The following categories of reference are included:
    T  : Play title
    C  : Compositor identifier
    P  : Signature
    A  : authorial attribution
    Y  : (occasionally) type of copy
    S  : Speaker prefix
    Z  : Act/scene prefix stage direction etc
    D  : embedded stage direction
Lines begining with either a space or a star are text lines, one for each
line in the original text. Lines begining within a * are justified lines.
Hinman's lineation for the folio is followed. Lines begining with a reference
(i.e. <) are not included in the lineation.
The character # is used in some texts to distinguish homographs (e.g. Will and
Will); it is also used with the hyphen to indicate cases where hyphenation
is significant.
The characters { and } (curly braces) are used to enclose material in italics
     Words containing tildes in the original texts have been expanded.
     Words hyphenated across a line boundary have be joined together and
     included at the end of the first line.  A "%" marks the hyphen in this
     case. If the second part of the hyphenated word was the only thing
     on that line, an underscore "_" on that line is used to indicate that
     it is a non-blank line in the original text.
Turnovers are joined together on the first line, and the character "|" is
used to mark this point.
I am not sure if we would like to go into this detail, but here is what
the OTA texts look like:
Sample from OTA *King Lear* F1:
      <T KL><L 1><Y Q><P qq2><C B>
1      <Z {Actus  Primus. Scoena Prima}.>
2      <D {Enter Kent, Gloucester, and Edmond}.>
3      <S {Kent}.>
4     *I thought the King had more affected the
5      Duke of {Albany}, then {Cornwall}.
6     *<S {Glou}.> It did alwayes seeme so to vs: But
7     *now in the diuision of the Kingdome, it ap-peares
8     *not which of the Dukes hee valewes
9     *most, for qualities are so weigh'd, that curiosity in nei-ther,
10     can make choise of eithers moity.
11     <S {Kent}.> Is not this your Son, my Lord?
12    *<S {Glou}.> His breeding Sir, hath bin at my charge. I haue
13    *so often blush'd to acknowledge him, that now I am
14     braz'd too't.
15     <S {Kent}.> I cannot conceiue you.
16    *<S {Glou}.> Sir, this yong Fellowes mother could; where-vpon
17    *she grew round womb'd, and had indeede (Sir) a
18    *Sonne for her Cradle, ere she had a husband for her bed.
19     Do you smell a fault?
20    *<S {Kent}.> I cannot wish the fault vndone, the issue of it,
21     being so proper.
22    *<S {Glou}.> But I haue a Sonne, Sir, by order of Law, some
23    *yeere elder then this; who, yet is no deerer in my ac-count,
24    *though this Knaue came somthing sawcily to the
25    *world before he was sent for: yet was his Mother fayre,
26    *there was good sport at his making, and the horson must
27    *be acknowledged. Doe you know this Noble Gentle-man,
28     {Edmond}?
29     <S {Edm}.> No, my Lord.
30     <S {Glou}.> My Lord of Kent:
31     Remember him heereafter, as my Honourable Friend.
32     <S {Edm}.> My seruices to your Lordship.
33     <S {Kent}.> I must loue you, and sue to know you better.
34     <S {Edm}.> Sir, I shall study deseruing.
35    *<S {Glou}.> He hath bin out nine yeares, and away he shall
36     againe. The King is comming.
37    *<D {Sennet. Enter King Lear, Cornwall, Albany, Gonerill, Re-gan},
38     {Cordelia, and attendants}.>
I apologize for the length of this posting, but I felt it was important
to get the issue of tagging the PD Shakespeare texts clearly in front of
us.  Your responses are sought.
					Hardy M. Cook
					Bowie State University

