Through our university and grant-funded fellowships, we often have students who are available to participate in faculty projects, and at a high level. In the past, students have helped with developing document encoding practices, validating data against original sources, writing commentary on texts, and developing GIS prototypes. We've trained students for these tasks in a variety of ways: formal training sessions, ad hoc gatherings, and one-on-one instruction.
Of all the things we do, we're most proud of the contributions that students have made to our projects. They've brought a spirit of investigation and creativity to our efforts that in many ways is more important than actual tasks they've performed.
We develop and manage data requirements in a number of different formats. Our particular expertise is in TEI-encoded texts. TEI (the Text Encoding Initiative) is a international standard for encoding texts in XML, thus making them machine-readable. To preserve, version, and control access to our documents, we use Subversion, a standard version control system most often used for program source code, but well-suited to our purposes, and to which we've added hooks to provide pre-update XML validation. The Spenser, Bizet and RCLGA projects are especially TEI-intensive.
We also design, implement and manage relational databases on OS X, Linux and Solaris platforms. Our database management system of choice is MySQL. We typically provide access to databases via a version of the Django web framework which we've customized for our purposes. We currently maintain relational databases for the Spenser, Material Culture in the German Novel, and Inventing the Federal Government projects.
We're also gaining experience in managing RDF (Resource Description Framework) datastores. RDF is particularly suited for problems which present an unclear set of data requirements, or which suggest that our understanding of the project's data requirements will evolve. We use RDFLIB to manage the datastores, and have developed a web interface on top of the CherryPy http framework.
Because we make such extensive use of open-source software, we're interested in giving back as opportunities present themselves. To date, we've published a couple of projects:
For the Bizet and Spenser projects, we developed a web server for delivering XML content, processed by a variety of XSL stylesheets, in a number of different interfaces.
For the Spenser and RCLGA projects, we developed a viewer for very large collections of high-resolution images. A sample is available here.
In addition, we have a number of other products in the pipeline, which we hope to publish or otherwise share as time permits:
As a result of supporting the Spenser project, we've slowing assembled the full toolchain necessary for preparing scholarly critical editions of early modern texts. Tools include our TEI data management components (see Data Management, above; see here for a video introduction for a TEI editor built on top of our XML server), an interface providing an electronic Hinman collator; a platform for recording differences discovered during collation and for using them to analyze the states of the text; web and image processing tools to enable crowd-sourced proofreading; and a process for linking texts that goes beyond the standard HTML anchor tag. Because the process is so heterogeneous, we doubt that we can release it as a package. However, it seems perfectly reasonable to offer it to other editorial projects on a software-as-service model.
We're gaining experience in performing named-entity recognition (NER) on inconsistently spelled (and perhaps badly transcribed) nineteenth century court records. The documents are so inconsistent that the standard NER solutions fail. We've worked out a series of manual processes and simple scripts which generate reasonably accurate results (available here and here). Our friends in the Olin Library have been urging us to write up our experience, since they think it may be of interest to other projects.
Of course, nothing attracts interest like a hit movie (not to suggest that the HDW has one in the can). We recently did some interesting work examining the literary networks sketched by the commendatory poems included in early modern books (a prototype is available here, Firefox only for now). Think of it as "Facebook before electricity." Besides being timely, the project is noteworthy because it takes advantage of a very large corpus (>25,000 texts, or about 1/4 of everything published in English between 1550 and 1700) from the Text Creation Partnership.
The projects we support have led us into various specializations. Of particular note is the interlocking problems of reading on the web, distant reading, and reading across multiple texts or complex data sets (see, for example, our process for linking texts, or our interface for analyzing the states of the text).
We've developed a number of Spenser-related interfaces: sample interfaces for the NEH, which demonstrate our ideas for coordinating the presentation of texts, textual states, commentary, and images; an on-line edition of Brittain's Ida, a former student's senior thesis project, which demonstrates the same desire toward the smooth presentation of multiple materials; and a tag cloud reader for comparing the the marginal glosses and source texts used in another of the Spenser project's texts.
We're also engaged with the problem of examining large bodies of textual information while not losing sight of details and context (a kind of "zoomed distant reading", perhaps). Examples of this work includes web pages for locating hypermetric lines in the 1590 Faerie Queene; web pages for finding tagged nouns in a pair of nineteenth century German novels, and for seeing those in context (available here and here).
From time to time, the HDW provides the university with services outside our normal portfolio. In the past, we've done video editing for the web and provided software for facilities planning. We're current engaged in providing the Campus Police with the software necessary to manage the university's emergency preparedness.