Make a Donation

Consider making a donation to support SHAKSPER.

Subscribe to Our Feeds

Current Postings RSS

Announcements RSS

Home :: Archive :: 2001 :: January ::
Re: SHAXICON
The Shakespeare Conference: SHK 12.0121  Friday, 19 January 2001

[1]     From:   Gabriel Egan <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Thursday, 18 Jan 2001 18:08:25 -0000
        Subj:   Re: SHK 12.0102 Re: SHAXICON

[2]     From:   Gabriel Egan <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
        Date:   Friday, 19 Jan 2001 13:43:32 -0000
        Subj:   Announcing SHAXICAN for those who can't wait


[1]-----------------------------------------------------------------
From:           Gabriel Egan <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Thursday, 18 Jan 2001 18:08:25 -0000
Subject: 12.0102 Re: Shaxicon
Comment:        Re: SHK 12.0102 Re: Shaxicon

David Kathman wrote

>Don was and is sincere in his desire to make
>SHAXICON widely available to all on the web.
>Unfortunately, he has been prevented from doing
>so by those twin bugbears of academics everywhere:
>lack of time and lack of money. Before SHAXICON
>can be put on the web, it will have to be transferred
>to a database format, and this would require a lot of
>money and/or time, both of which have been in short
>supply lately.  SHAXICON will eventually be made
>public (and believe me, nobody in the world wants
>that more than Don Foster), but it's hard to
>say right now when that will happen.

Let me make sure I understand SHAXICON right:

1) You make a list of all the characters in all the plays (using unique
identifiers for the multiple Claudios, Antonios, etc).

2) You make a list of all the rare words in all the plays (rare in the
sense that they are words Shakespeare rarely uses).

3) You count how many times each of the characters in (1) uses each of
the rare words in (2).

4) You take a sample text (one of the plays) and make a list of all its
words and how frequently they appear.

5) You check the list in (3) with each of the lists in (2) to see if
there's a character whose rare words turn up much more often in the
sample play than they do in the Shakespeare canon as a whole.

6) If (4) yields a good match, that character was played by Shakespeare
shortly before he wrote that play. (Hence those rare words were
over-represented in the sample play: they were in Shakespeare's head
from his having recently memorized them for his part.)

Have I got it? If so, what's the big deal? The following Perl script
will, for a given etext of a play, throw out an alphabetized word list,
with the number of occurrences of each word.

$/="";
$*=1;
while (<>) {
  s/-\n//g;
  tr/A-Z/a-z/;
  @words = split(/\W*\s+\W*/, $_);
  foreach $word (@words) {
    $wordcount{$word}++;
  }
}
foreach $word (sort keys(%wordcount)) {
  printf "%20s %d\n", $word, $wordcount{$word};
}

This script is on page 39 of Larry Wall and Randal L Schwartz
_Programming Perl_ (Sebastopol CA: O'Reilly, 1991) and it's meant to
introduce learners to the basics. I've run a Hamlet etext through this
script and put the results up on the web at
www.totus.org/scratch/hamfreq.txt (you have to scroll past all the
act.scene.linenumbers which have risen to the top because the list is
alphabetized).

Surely there's more to SHAXICON since I seem to have done step (4),
admittedly the easiest step, in under 10 minutes (including publishing
the results). I'm ready for my close up, Mr Gleason.

Gabriel Egan

[2]-------------------------------------------------------------
From:           Gabriel Egan <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:           Friday, 19 Jan 2001 13:43:32 -0000
Subject:        Announcing SHAXICAN for those who can't wait

Those who spend a long time on trains-which in the UK doesn't exclude
those taking short journeys-can find themselves wanting to 'work' but
not on their everyday matters.

For me tinkering with the programming language Perl fills this need, and
since Donald Foster's SHAXICON database is, as David Kathman informs us,
unlikely to be published soon, I've started to dabble in the same area.
So far I've written scripts which do the following:

* gather the names of all the characters in all the plays and assign
each a unique identifier;

* list alphabetically all the 'rare' words in the Shakespeare canon
(i.e. words Shakespeare used 12 times or fewer).

The scripts and the resulting listings are available on the web at
www.totus.org/SHAXICAN

If I've understood SHAXICON correctly, the above represent first-draft
completion of stages (1) and (2) of the 6 stages which are SHAXICON's
main operation. A script for stage (4) was included in my lasting
SHAKSPER posting which described the 6 stages. SHAKSPERians, especially
those fluent in Perl, are invited to inspect, criticize, and improve
upon SHAXICAN in order that this area of study may be progressed.

Of course, if I've entirely misunderstood SHAXICON, all the above is
monumental hubris...

Gabriel Egan
 

©2011 Hardy Cook. All rights reserved.