Make a Donation

Consider making a donation to support SHAKSPER.

Subscribe to Our Feeds

Current Postings RSS

Announcements RSS

Home :: Archive :: 1992 :: March ::
PD Shakespeare: Tagging (Cont'd)
Shakespeare Electronic Conference, Vol. 3, No. 63. Monday, 16 Mar 1992.
 
 
(1)	Subj: 	Re: SHK 3.0062 PD Shakespeare Project: Tagging
	Date: 	Sun, 15 Mar 92 18:31:54 -0500
	From: 	Vint Cerf <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
 
(2)	Date: 	Sun, 15 Mar 1992 20:56:01 -0500
	From: 	"Tad Davis" <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
	Subj: 	RE: SHK 3.0062  PD Shakespeare Project: Tagging
 
(3)	Date: 	Mon, 16 Mar 1992 17:01:01 -0500
	From: 	"Tad Davis" <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
	Subj: 	Another text file format
 
 
(1)----------------------------------------------------------------------
Subject: 3.0062 PD Shakespeare Project: Tagging
Comment: 	Re: SHK 3.0062 PD Shakespeare Project: Tagging
Date: 		Sun, 15 Mar 92 18:31:54 -0500
From: 		Vint Cerf <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
 
I would urge caution in the use of TeX because it introduces
a good deal of syntactic fruit which might interfere with the
kinds of searches I understand you like to do. There IS a way
to produce a plain ASCII from TeX but that seems like extra
work. My recommendation would be to give serious thought to
allowing multiple forms of these objects/documents to exist
so that some would be used for text crunching and others for
various presentation purposes.
 
Vint
 
(2)----------------------------------------------------------------------
Date: 		Sun, 15 Mar 1992 20:56:01 -0500
From: 		"Tad Davis" <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Subject: 3.0062  PD Shakespeare Project: Tagging
Comment: 	RE: SHK 3.0062  PD Shakespeare Project: Tagging
 
Just wanted to register my preference for using {braces} to mark italicized
text rather than <it>special markers<end-it>.
 
Tad Davis

 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 
 
(3)----------------------------------------------------------------------
Date: 		Mon, 16 Mar 1992 17:01:01 -0500
From: 		"Tad Davis" <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Subject: 	Another text file format
 
Ken,
 
I've attached a description of "setext," which is a file format in modest
use in the mac world. I don't think the entire document is worthy of
distribution to the entire SHAKSPER list, but it might be worth
summarizing.
 
To get a copy of the document, send mail to:
 
	
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 
 
with a single word in the subject line: "setext".
 
I like the suggestion of using ~the squiggle~ (whatever it's called) to
represent italics.
 
The major advantage of this type of format is that it does go easier on the
eyes than more heavily structured formats. The disadvantage is that
embedded characters that never occur in real life, in most business
communication -- like ** and ~ -- may occur in old documents and need to be
represented in some way.
 
Tad Davis

 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 
 
 
------- Forwarded message
 
Date: Mon, 16 Mar 92 16:11:01 -0500
From: <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 .relay>
To: davist@a1.relay
Subject: setext_concepts.etx
 
 
# Message from   Ian Feldman, the Current Setext Oracle
# Date:          Thu, 5 Mar 92 19:56:00 +0100 (CET)
# Reply-to:      
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
  (Keepers of The Setext Flame[tm])
# X-new-address: no more mail to <not.bad.se> please
# Lines: 229
# Subject:       setext_concepts_Mar92.etx
 
 
  Thank you for your interest in the setext format. Enclosed is an
  advance sheet that will remain in effect until the first public
  release of the setext format package (originally planned for around
  March 1st, 1992, now delayed).
 
  If you recognize some of the arguments presented here then that is
  the price that you are paying for having been an early bird. ;-))
  Please note that my email address may change in the near future;
  consult the trailer of weekly issues of TidBITS for the most
  current one.
 
 
  What is setext
----------------
  As originally explained in TidBITS#100 and mentioned there from
  now on, that publication now comes "wrapped as a setext." The noun
  itself stands for both a method to wrap (format) texts according
  to specific layout rules and for a single _structure_enhanced_
  text. The latter is a text which has been formatted in such a
  fashion that it contains clues as to the typographical and logical
  structure of its source (word-processed) document(s), if any.
  Those clues, which I call "typotags," facilitate later automatic
  detection of that structure so it can be validated and extracted/
  processed/ transformed/ enhanced as needed, if needed.
 
  It follows that setexts, being nothing but pure text (albeit with
  a special layout), are eminently readable using ANY editor or
  word processor in existence today or tommorrow, and not only on
  the Macintosh either. ANY computer, any computer program that is
  capable of opening and reading text files can be used for reading
  setexts. By default all properly setext-ized files will have an
  ".etx" or ".ETX" suffix. This stands for an "emailable/ enhanced
  text", the ExtraTerrestial overtones nothwistanding ;-))
 
  Unlike other forms of text encoding that use explicit, visible tag
  elements such as <this> and <\that>, the setext format relies
  solely on the presence of _implicit_ typotags, carefully chosen
  to be as visually unobtrusive as possible. The underlined word
  above is one such instance of the defacto "invisible" coding.
  Inserted typotags will at worst appear as mere "typos" in the text.
 
  Similarly, just to give an example, here is a short description
  of the four types of word emphasis typotags that setexts MAY
  contain, limited to one emphasis type ONLY per word or word group:
 
 -------------------  ----------------------------  --------------
       **aBoldWord**  **multiple bold words**       ; bold-tt
 _anUnderlinedWord_    _multiple underlined words_  ; underline-tt
     ~anItalicWord~    ~multiple italicised words~  ; italic-tt
          aHotWord_     multiple_hot_words_         ; hot-tt
 -----------------------------------------------------------------
 the 'hot-tt' is synonymous with the 'grouped' style of HyperCard
 
  Please note, however, that the <end> strings previously found in
  TidBITS #100-109 were not part of the format as such, but were
  added by Adam Engst for a specific setext-raterrestrial purpose.
 
 
  Why is setext
---------------
  Data formats like the RTF (Rich Text Format) and SGML (Standard
  Graphic Markup Language) have been designed for processing ONLY
  by software. Setext, on the other hand, has been _optimized_
  for reading directly by human eyes on what probably is still the
  lowest common denominator of today's computer hardware, an 80-
  character by 24-line terminal screen (or, in effect, any computer
  screen). It follows that the format is intended chiefly for
  smaller texts, those of a size that a human reader might find
  within her capacity of overview.
 
  I need to state explicitly that although TidBITS is currently the
  only setext publication in wide distribution, the setext is NOT
  synonymous with that of TidBITS's layout. Many other distinctive
  layouts are possible. TidBITS is therefore just an _instance_ of
  the format, not THE setext format. More specifically, that also
  means that any of you thinking of writing a "TidBITS browser"
  should in reality be considering a "setext browser." Otherwise
  your program will in all probability be able to recognize only
  today's specifically-formatted TidBITS and no other future setext
  publications (which are in the making), including that of a future
  possibly changed or modified TidBITS.
 
 
  How come is setext
--------------------
  The idea of a common format for online-distributed publications
  grew in my mind since approximately 1986-87. It came into focus
  after I started corresponding with Adam C. Engst, following my
  April, 1990 criticism of the original TidBITS presented as a
  HyperCard stack. Gradually it ceased to be a redesign effort for
  the TidBITS and became instead a generic format for all kinds of
  electronic publications (which I affectionately call "the compu-
  rags" ;-)). I hit on the current "tagless" version of the format
  in the winter of 1990 and the first internal beta product -- a
  setext encoder for TidBITS -- saw the light of the day in July of
  1991. Later Adam wrote a setext-encoding Nisus macro for his
  personal use, the one he now uses to wrap the weekly issues of
  TidBITS (he isn't putting all those spaces and dashes in there
  entirely by hand! ;-))
 
  As can be seen from the above setext is not some quickie project,
  though up and finalized in a few afternoons. A lot of thought
  has gone into it and some of it has survived to the present day.
  Needless to say the format definition will be placed in the
  public domain and its use actively promoted by the many parties
  that have expressed an interest in adopting it for their own use.
 
 
  What for is setext
--------------------
  The setext (data) format is intended primarily for use by online-
  distributed periodic publications. It is particularly well-suited
  to all kinds of electronic digests and other types of repetitively
  disseminated text information. Despite its formal appearance as
  "mere stream of unenhanced ASCII characters on a computer screen"
  setext is rich enough and unambiguous enough to permit construction
  of fairly complex encoding engines for specific application purposes
  (also on top of the format) and to allow easy implementation of a
  countless number of front-end browsers/ decoders and other
  reading/ archiving-enhancement tools.
 
  While setext does, indeed, allow the preservation of a source
  text's structure it does not, by definition, guarantee the 100%
  ability to recreate it at the destination. Any word originally
  styled as **bold** may in effect end up as Yellow-On-Black or be
  set in a different font, or considered a candidate for a
  cumulative keywords list or be deemphasized at will. There are
  not now and never will be any rules to govern how decoded setexts
  should be presented at the receiving end. It will be up to each
  front-end's author to ensure that decoded (no-longer-)setexts are
  presented in a fashion that's agreeable to his/ her end users.
  There is plenty of sound advice and recommendations on how to
  achieve that but that's an entirely different matter.
 
  Those principles also apply to decoding of a setext's logical,
  rather than merely its typographical, structure. The format does
  not rely on some large set of predefined, unambiguous, mutually-
  exclusive rules. Rather, it "knows of" just the barest set of
  typotags (currently 14), knows their symbolic purpose and what
  criteria to use when looking for and validating them in a setext.
  This approach differs some from the commonly heard programmers'
  wish for clearly-delimited data patterns that can be scanned for
  quickly and their position used as an offset to the text to be
  displayed.
 
  Setext has those patterns too but, since it relies primarily on
  defacto "invisible" elements that could also be part of the text
  itself, it must validate them first before proceeding with any
  enhancements. Writing a real setext decoder is therefore
  conceptually much closer to (though nowhere near as hard as)
  writing an SGML application than it is to writing a macro routine
  to munge some data in one predefined fashion. In spite of all
  that, setext tools should be easily implementable with, and no
  more complex than, typical HyperTalk, sed, awk and perl scripts.
  The barest minimum required for such an attempt is an intelligent
  search/ replace function in a programmable macro editor. Though
  yet to be proven, conceptually there is nothing in the format to
  prevent implementation of real-time setext browsers written in,
  say, some advanced pattern-matching macro language of a terminal
  emulator program.
 
 
  Where is setext
-----------------
  There are yet no known setext tools in existence. I have a
  working prototype of a browser, which is not far from completion.
  I've also submitted a paging macro routine for rn (a popular
  newsreader under unix) to TidBITS (#110), which should ease
  jumping between the topics. I've also opened a mailing list for
  developers and future setext publishers: <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
  If you received this letter in your mail then you're already a
  subscriber of it. Otherwise please send me a short note, stating
  whether you're interested in writing a setext tool or merely just
  an interested observer/ future user and your Internet-accessible
  email address and I will put you on the list and/ or reply as soon
  as possible.
 
 
  When is setext
----------------
  Due to a varying work load and other distractions between the
  original announcement of the planned release and the actual date
  of it, the browser that I am writing is not yet ready. I do not
  intend to repeat the mistake of preannouncing it again. Instead
  please feel free to join the mailing list through which the rest
  of the specifications will be published. The full release will
  contain approximately 150K worth of setexts on setext along with
  a demo browser written in HyperCard (2.0) that will permit
  showing of the format's capabilities in a dynamic rather than
  the strictly textual and sequential fashion. Those of you who
  know me, know also of the high standards of coding that I try to
  adhere to.
 
  If you're among those that have already written a prototype
  that's based mainly on a reverse-engineered layout of the current
  TidBITS then you'd be well advised not to release it without prior
  validation of it by me. Please do not call your product a
  "setext browser" (or whatever) UNLESS it is truly capable of
  parsing all (future) setextized docs, not solely TidBITS.
 
 
  How is setext
---------------
  A lot can (and will) be said about it but there is one claim no
  other text encoding method can make: "there is a lot more of me
  than meets the eye" ;-))
 
 
  Who is setext
---------------
  The setext format and its underlying philosophy isBroughtToYouBy
  Ian Feldman <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >. I live in Stockholm, Sweden, Europe.
  I used to work as and describe myself variously over the years
  but now simply contend myself with being just a free Human Factors
  thinker and tinkerer.
 
 
. last line contains a twodot-tt, a tag signifying the logic end of
. text while those three lines are all suppress-typotagged ones, i.e.
. can be suppressed (hidden) by a front-end application by default.
.
 
-----------------------------------------------------------------------
This information brought to you by the TidBITS Fileserver, conveniently
located near you at <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >. To speak with a
human, send email to Adam C. Engst at <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >. Enjoy!
 
 
------- End of Forwarded message
 

Other Messages In This Thread

©2011 Hardy Cook. All rights reserved.