The Shakespeare Conference: SHK 30.091 Tuesday, 5 March 2019
Date: March 4, 2019 at 8:08:49 AM EST
Subject: Re: The Shakespeare Canon and NOS
Those readers who are following our discussion of the work adjacency networks method will, I hope, be interested if I try to explain for everyone’s benefit one of the points at issue between Gerald Downs and Gabriel Egan.
In their Shakespeare Quarterly article, Egan and his co-authors describe one principle of their method as follows:
When comparing the networks of two texts, the difference between their respective usage of the word “and” should matter more to us than their respective usage of the word “beneath,” simply because the word “and” appears more often in English writing.
Downs asked a very simple question: Why? I’d like to show that the question is a pertinent one, by a very simple example.
Imagine if you’re looking at web pages to try to find information about some topic. You will likely move from page to page, following links from one website to another. If, while you are doing this, you find yourself being regularly linked to some websites much more than to others, then the chances are that the former are useful websites. That’s why people link to them a lot. Google uses this principle to decide which web pages to give the highest ranking to. We know why it does this: because it has to decide which web page should appear first in your search results, which second, which third, and so on.
In my simple example, let’s imagine a text that uses just three function words: “and”, “it”, “that”. It doesn’t use all possible phrases involving these words. It uses just the following four phrases:
It uses all these phrases an equal number of times. Notice that when the text uses “it” or “that”, it always follows the word by “and”. But when it uses “and”, it follows it half the time by “it” and half the time by “that”.
If you apply Google’s technique, as Egan has explained it, then you will discover that “and” gets a rank that’s twice the rank of “it” and “that”. The reason is that when you hop from function word to function word, two hops out of every four will take you to the word “and” whereas only one hop out of four will take you to “it” and only one hop out of four will take you to “that”. According to the formula that Egan and his co-authors use, this means that the phrases “and it” and “and that” are given twice the weight of the phrases “it and” and “that and”.
Is it obvious to you why this is the right thing to do for authorship attribution purposes? Me neither. That’s why Gerald Downs’ question was a good one. It would not be surprising if Gabriel Egan couldn’t answer it, because we know from the publication history that all the formulae had been decided by his co-authors in their works published without him. It would have been better if he had just said candidly that he can’t answer the “Why” question. The answers he has given have about the same explanatory value as Antony’s description of the crocodile in Antony and Cleopatra: “It is shaped, sir, like itself, and it is as broad as it hath breadth; it is just so high as it is, and moves with it own organs. It lives by that which nourisheth it, and the elements once out of it, it transmigrates.”