Make a Donation

Consider making a donation to support SHAKSPER.

Subscribe to Our Feeds

Current Postings RSS

Announcements RSS

Home :: Archive :: 2008 :: May ::
XML (eXtensible Markup Language)
The Shakespeare Conference: SHK 19.0308  Wednesday, 21 May 2008

[1] 	From:	Gabriel Egan <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
	Date:	Tuesday, 20 May 2008 17:37:14 +0100
	Subj:	Re: SHK 19.0306 XML (eXtensible Markup Language)

[2] 	From:	Michael Best <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
	Date:	Tuesday, 20 May 2008 16:17:17 -0700
	Subj:	Re: SHK 19.0306 XML (eXtensible Markup Language)


[1]-----------------------------------------------------------------
From:		Gabriel Egan <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:		Tuesday, 20 May 2008 17:37:14 +0100
Subject: 19.0306 XML (eXtensible Markup Language)
Comment:	Re: SHK 19.0306 XML (eXtensible Markup Language)

Hardy Cook asks:

 >So my first question is do Office 2007's files in
 >XML save to "a format that will be reliably modified
 >to work on any future system"? In other words, is there
 >the same problem with Microsoft's XML standard
 >as there is with its HTML standard?

Yes, the problem is almost precisely the same. Microsoft's 
implementation of XML is crippled in most versions of its software. To 
understand how, it's necessary to know a little about XML. Taking HTML 
as the starting point, those who know HTML will agree that there are 
predefined 'tags' that one can put around elements in the text. Thus, a 
paragraph of text begins with a <p>tag and ends with </p>tag, and an 
italicized word begins with in <i>tag and ends with an </i>tag.

As well as defining the tags, the HTML standard defines certain rules 
about the tags and the relationships between them. For example, tags are 
in general embedded, one within another, like Russian dolls. If a word 
'house' is to be both italicized and underlined, the tags must be paired 
like this <i><u>house</u></i>and not overlapped like this 
<i><u>house</i></u>.

The definitions of the tags and the rules that govern their 
relationships are built into the HTML standard, indeed that's all HTML 
is: the standard.

XML works the same way, except that rather than us all agreeing on the 
tags and the rules beforehand, XML allows the user to define the tags 
and the rules.  Thus for any XML document there have to be two texts: 
the document itself and the 'schema' that defines the tags and the 
rules. Because XML is really just the standard for writing schemas, all 
sorts of disparate kinds of data can be represented in XML. Once you've 
defined the schema for, say, the representation of questions in a 
multiple-choice online quiz, you've created a new tagging standard 
rather like HTML, but one suited to your purpose. (Of course, this has 
already been done and the result is QML, or Question Markup Language. If 
your online quiz software is QML compliant, all quizzes written in 
conformance with QML will work on your system.  That's where the claim 
of inter-operability comes in whenever people extol the virtues of XML.)

The problem with Microsoft's implementation of XML is that you don't get 
to write the schema of a Word document unless you buy the most expensive 
variant ('Enterprise' or 'Professional' edition) of the software. All 
ordinary users find that their '.docx' Word documents are written to a 
predetermined schema supplied by Microsoft called WordML. 
Unsurprisingly, it's execrable and works with nothing else: it was 
designed merely as an embodiment of the proprietary format Microsoft was 
already using for Word files (the '.doc' format). The point was to give 
the appearance that Microsoft had gone over to an Open Standards 
philosophy, while maintaining proprietary control.

Gabriel Egan

[2]-----------------------------------------------------------------
From:		Michael Best <
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 >
Date:		Tuesday, 20 May 2008 16:17:17 -0700
Subject: 19.0306 XML (eXtensible Markup Language)
Comment:	Re: SHK 19.0306 XML (eXtensible Markup Language)

Hardy M. Cook wrote:

 >My second question is to Michael: How are files for the
 >Internet Shakespeare Editions encoded into XML? Are
 >they encoded manually or do you use a program to
 >perform the encoding? And if so, what is that program
 >or what is the process?

This is an excellent and deceptively simple question. As Hardy will 
realize, since he has been working on the poems for the Internet 
Shakespeare Editions, Shakespeare's texts are complex, and our aims 
ambitious. We aim to encode, in the old-spelling texts now on the  site, 
a great deal of information about both the semantic structure of the 
plays (how they are divided into acts, scenes, speeches, and so on), and 
about the physical structure of the books they were published in, with 
their division of pages, columns, and physical lines. Normal XML does 
not deal elegantly with this level of complexity, and has to privilege 
one of these structures. Our response has been to encode the plays and 
poems initially in an earlier, more flexible standard (SGML -- Standard 
Generalized Markup Language), from which we generate separate XML files 
for the different structures.

Unfortunately there is as yet no program that simplifies the process of 
encoding files of this kind. We have developed our own software to 
generate the XML files, and use Oxygen -- a powerful XML editor -- to 
work with them. Our general principle is to use Open Source software 
where possible, because it adheres more closely to accepted standards 
than much proprietary software. As Hardy comments, Microsoft software in 
general fails to follow the standards set by the ISO (International 
Organization for Standardization, www.iso.org); I have not looked deeply 
at their XML, but I do know that it is very difficult to work with. 
Perhaps others on the list will be able to respond more fully.

Cheers--
Michael
Coordinating Editor, Internet Shakespeare Editions
<http://internetshakespeare.uvic.ca/>
Department of English, University of Victoria
Victoria B.C. V8W 3W1, Canada.


_______________________________________________________________
S H A K S P E R: The Global Shakespeare Discussion List
Hardy M. Cook, 
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 
The S H A K S P E R Web Site <http://www.shaksper.net>

DISCLAIMER: Although SHAKSPER is a moderated discussion list, the 
opinions expressed on it are the sole property of the poster, and the 
editor assumes no responsibility for them.
 

Other Messages In This Thread

©2011 Hardy Cook. All rights reserved.