A Scenario from the Networked Digital Library of Theses and Dissertations:

The Life of an ETD from Creation to Dissemination

Neill A. Kipp

Electronic Thesis and Dissertation Initiative

Virginia Polytechnic Institute and State University

DRAFT: updated September 15, 1997


At Virginia Tech, we require that students who must submit theses and dissertations to our Graduate School do so electronically. Herein, we follow two Electronic Thesis or Dissertation (ETD) submissions from creation by Virginia Tech students to retrieval by patrons of the Networked Digital Library of Theses and Dissertations. In particular, we focus on ETD workflow with respect to standards (e.g., SGML, HyTime, TEI, JPEG, VRML, MARC, Z39.50) that we use (or have designed to use) in our acceptance process. (Please see Rosson and Carroll [1993] for more information about scenario-based design.)

Football weather. The air is crisp and clear in autumn in Blacksburg. The trees that crawl up the mountainsides have a blue hue as frost crystalizes the blades of grass in our valley between. Mallards pause at the campus pond for a breadcrumb and a chilly swim. Meanwhile nearly five hundred graduate students are working all hours, preparing to graduate.

Emily has performed her research and has her results. Knowing that electronic submission is now required by Virginia Tech, she attends an ETD workshop (given several times per semester) to learn how to prepare and submit her dissertation electronically. At the workshop, she is presented these options as acceptable submission formats: Adobe Portable Document Format (PDF), the Electronic Thesis and Dissertation Markup Language (ETD-ML), and, until user tools for its PDF conversion are more available, DVI as output from TeX or LaTeX.

Emily learns that Adobe PDF is the published but proprietary successor to Adobe PostScript, designed especially for electronic publishing. She learns that it is reasonably simple to generate from regular word processors, using tools called Adobe PDFWriter and Adobe Distiller, and that it is easy to create hyperlinks with Adobe Exchange. She also learns that the Virginia Tech New Media Center can help her with the Adobe tools when she is ready to use them.

Also at the workshop, Emily learns about ETD-ML: a special language written as an application of the Standard Generalized Markup Language (SGML), primarily for authors comfortable with HTML and Web publishing. Emily, however, is not familiar with Web publishing. Though it does intrigue her, she will leave all the `whatever-MLs' to the computer whizzes.

Emily and her entire department use Microsoft Word; therefore she chooses to prepare her ETD in Word, and to submit her ETD in Adobe PDF.

Emily has some 8x10-inch color photographs that directly support her research results. She goes to the New Media Center to scan them in. To get the resolution she wants, the picture files grow to be very large, even with JPEG compression! Furthermore, they crash her word processor when she tries to paste them in. She reviews the ETD Web site [http://etd.vt.edu] where she finds explicit details on how to use hyperlinks to connect her photographs as "external multimedia objects" to her main PDF file. She decides to have a 2-inch square "thumbnail" of the image in her document, each hyperlinked to a separate full-sized photograph.

Multimedia challenges met, Emily prepares her defense and defends successfully. Each of her committee members uses Acrobat Reader to page through her PDF file, follows the links to her photographs, smiles, nods knowingly, and signs the Virginia Tech ETD Approval Form. Good work, Emily.

Early the next morning, Emily launches a networked Web browser (as suggested in the workshop) and completes the Web page submission form (created by ETD workflow scripts; see below). She fills in her title, her name, degree, major, each committee member's name, and all the other data that appears on her title page.

Emily and her committee chair have agreed to allow Web access to her ETD. They could have chosen to allow access to Virginia Tech only---if they were expecting to publish her results in a journal that required no-prepublication, or they could have denied all access whatsoever--if they were expecting to apply for a patent. Emily selects "available immediately worldwide."

She pastes in her abstract (inserting HTML paragraph tags between paragraphs) and uploads each of her five files (one PDF file and four JPEG files), to the library ETD server. She completes the submission survey and receives notifcation that her ETD has arrived to be reviewed. Over lunch, she walks her signed ETD Approval Form to the Graduate School, then goes to the coffee house for a celebratory boisenberry cappuchino.

Electronic Thesis and Dissertation Markup Language (ETD-ML). Meanwhile, Raoul, a Master's degree student in chemistry (who sat next to Emily at the ETD workshop), has decided that ETD-ML is worth a try. ETD-ML, he has learned, is like HTML (with which he is very familiar) and is designed specifically for ETD submission. He downloads the support information, installs the formatting software, and prepares his ETD as a hypertext under ETD-ML.

Text Encoding Initiative (TEI). Raoul determines that ETD-ML is much like the famous TEI standard [XXX]: his roommate borrows his latex gloves to salvage decaying journals for the library. Raoul confirms (by contacting ETD technical support) that the two standards are similar enough to be mapped onto one another using a script similar to the one he downloaded that translates ETD-ML into HTML. Raoul is glad that ETD-ML is designed specifically for ETDs---he wants the software program to warn him if he has omitted any critical information in his ETD submission.

Raoul finishes his thesis and defends it successfuly. Like Emily, he collects his committee's signatures. Raoul's major professor downloaded the ETD draft over his cellular modem: he was attending a polymer conference in California! Fortunately, he returned in time to sign the form. Raoul creates a single "tar-gz" file of his 240 ETD files: text, tables, equations, graphics, and virtual reality models of organic compounds (encoded as VRML files). He submits it to the library server, using the same electronic upload forms that Emily did.

While submitting, Raoul notices that the steps in submission are managed by a series of Common Gateway Interface (CGI) scripts. These Perl scripts carry his data along while he completes his submission. He assumes (correctly) that the library is using an SGML-based workflow system underneath.

On the exit survey, he writes, "I could not have managed all these digital objects without ETD-ML. It saved me many headaches." [Note: this is an actual quote from a doctoral chemist who submitted the first ETD in ETD-ML.]

ETD Workflow using SGML. Emily's dissertation and Raoul's thesis are waiting on the library ETD server to be reviewed by the Graduate School. Each ETD is in its own directory on the server. In addition to the ETD files each student submitted, an SGML file that contains metadata and workflow tracking information accompanies each of their submissions. The workflow file contains all the metadata they each entered into the submission form, each with its own named tag (SGML generic identifier). The workflow file conforms to a SGML document type (etdworkflow.dtd) created especially for facilitating the workflow process of ETDs to and from the Graduate School and University Library. From the carefully-tagged information in the SGML metadata document, other documents may be generated automatically. In particular, we can generate TEI headers, MARC records, formatted bibliographic input to the Dienst distributed digital library server, and virtual card catalog input to the IBM Digital Library and OCLC digital library servers [TEI, MARC, Dienst, IBM, 19xx].

The SGML workflow file also includes a unique, logical Universal Resource Name (URN). The URN will be used as a constant document identifier so that an ETD can be moved from server to server without disturbing any published references to it [URN, 19xx].

ETD Review. Gwen, the ETD reviewer in the Graduate School, arrives back from a quaint downtown cafe and sees two new ETDs have arrived to be reviewed. Gwen gives her password to the server; the server returns an "ETD reviewer page," and Gwen begins reviewing Emily's dissertation. Gwen can see the metadata that Emily entered into the submission form. Starting with "etd.pdf," Gwen follows each link from the reviewer page to the corresponding file that Emily has submitted. Gwen reviews the ETD, then checks that Emily has turned in all appropriate forms and has paid all the appropriate fees. Seeing that this is true, Gwen clicks on the "Approve" button on her browser. The system sends approval, by email, to Emily and her committee chair. With paper submission (only one year ago), this process took two weeks instead of two hours.

Had Raoul been watching when Gwen reviewed his thesis, he would have noticed that the ETD review process is also driven by Perl scripts that interact with the same SGML workflow file that was used when his ETD was submitted.

After notifying Emily, Raoul, and each of their committee chairs of the good news, the system updates the SGML workflow records to indicate that the ETD is approved and should be cataloged. At this point, the ETD is available for public download over the Web.

ETD Cataloging. Mary, the ETD cataloger, connects to ETD Cataloging Web pages the next morning. After she gives her password, she sees that two new ETDs are ready to be cataloged. The numbered MARC fields appear in her Web browser (these fields are also generated from the ETD workflow document) and she pastes the metadata into her Z39.50 terminal. OCLC's servers accept her catalog record, and the Virginia Tech online catalog system (VTLS) will download the new record over the next weekend.

Also at this point, when the final system is implemented, the alphabetical list will be updated automatically, a Dienst bibliographic record will be generated and sent to the Dienst server, an IBM virtual catalog entry will be sent to the IBM Digital Library, and all relevant HyTime independent links will be created and added to the HyTime Engine server that manages intelligent collection browsing [HyTime 1997; Kipp, 1997].

The next day is another gorgeous day in the New River Valley. Emily forwards a copy of Gwen's acceptance note to her father in Northern Virginia. Her father finds his daughter's ETD by using his Web browser to access the NDLTD. He browses to Virginia Tech and searches for "Emily" on the library server. He prints Emily's title and abstract pages from his free copy of Adobe Acrobat Reader [http://www.adobe.com/] and shows them proudly around his office. (Meanwhile some chemists at Kodak Labs have found Raoul's thesis and have already contacted their recruiting department.)

Conclusion. Any university can join the Networked Digital Library of Theses and Dissertations and begin accepting Electronic Thesis and Dissertations. Please see [http://www.ndltd.org/] for complete information. Please contact the project team by email, etd@ndltd.org.


Note: I will provide a bibliography, figures, deeper technical information (including SGML examples), NDLTD access statistics, the TEI DTD and ETD-ML and the conversion between them, and contact information when the final TEI submission is due in October.