Distant Reading for Close Analysis: Projects Building on EEBO-TCP

As scholarly citation of EEBO-TCP has increased in frequency (as described in an earlier blog post), so have the number of other projects that rely on EEBO-TCP for their datasets.

Some of these projects have built linguistic tools, like Lexicons of Early Modern English (LEME) (begun in 1986 but ever-evolving), which has used early dictionaries digitized by EEBO-TCP to populate its collection of words. Some create collections of texts: Verse Miscellanies Online was produced in partnership with EEBO-TCP, assembling a literary subcorpus beginning with Tottel’s Miscellany and enhancing the EEBO-TCP files with descriptive tagging and annotations. The University of Oxford Text Archive (OTA) has created (among other resources) a TCP catalog that includes not only EEBO, but also the eighteenth-century English print corpus ECCO, and the early American corpus of texts known as Evans. Rather than searching in a “flat” Google-like field, users are able to filter in specific areas: title, availability, date, and term. OTA has assigned subject fields to every text in their catalog.

There are several corpus-analysis tools: Similar in mission to Early Print, Visualizing English Print (VEP) converts EEBO-TCP to plain text and assembles targeted subcorpora. They provide, for example, an Early Modern Science Collection, and they also allow users to create their own corpora. In addition, VEP provides visualization tools to promote the analysis of those corpora.

Distant Reading Early Modernity (DREaM) harnesses users’ queries to develop enhanced corpora. The project normalizes 44,000 fully transcribed texts and enrich their metadata, particularly enhancing publication records, by connecting the EEBO-TCP metadata to OCLC’s Linked Open Data. DREaM’s data asserts itself with varying levels of confidence so that users can direct their attention advisedly. DREaM shares its enriched dataset and enables batch downloads of full-text corpora.

CQPweb is another corpus analysis tool that provides users a “Workbench”. The Workbench facilitates concordancing; the analysis of collocations; and the preparation of distribution tables and charts, frequency lists; and search by keywords or key tags. (EEBO-TCP is just one of many datasets accessible to this tool.) The Workbench can operate on even more metadata than EEBO-TCP offers, including part of speech, lemma, and semantic tags.

Wordhoard, also attentive to philology, relies on an EEBO-TCP-based canon to perform analyses of how words are used; inputting a particular word will elicit the variants of that word alongside their parts of speech. In Wordhoard, users can also analyze for qualitative information, such as how frequently a word is used by men vs. women.

Corpus-level inquiry can issue into more subject-specific projects. Alice Eardly has written about the utility - and challenges - of using EEBO-TCP to look at early modern women’s writing (“Hester Pulter’s “Indivisibles” and the Challenges of Annotating Early Modern Women’s Poetry”, Studies in English Literature, 1500-1900, Vol. 52, No. 1, The English Renaissance (Winter 2012), pp. 117-141), and indeed, the Women Writer’s Project and the RECIRC project both tackle these issues head-on while (as part of their work) mining EEBO-TCP not only for women’s writing, but for commentary on women’s issues. Similarly, Heather Froelich recently embarked on a project that looked to the Historical Thesaurus of the OED and EEBO-TCP to create a corpus of linguistic terms related to ‘whorishness and unchastity’, with the intent that we might expand our semantic understanding of the concepts.

Matthew Steggle’s Digital Humanities and the Lost Dramas of Early Modern England: Ten Case Studies (Routledge, 2015) uses EEBO-TCP as its principal tool in conducting an analysis of the titles of ten lost plays, with the aim of clarifying their subject matter. Steggle’s case studies all concern themselves deeply with the technique and procedure of going about such a project; he is transparent about his process throughout, and hopes to demonstrate some of the potentiality of digital humanities.

Other highly inclusive projects have used EEBO-TCP as a supplementary source of data. The Map of Early Modern London (MoEML) recreates a rather granular depiction of London at the time of Shakespeare, collecting details from a variety of sources including EEBO-TCP, where collected plays, poems, and pamphlets assume an intimate knowledge of the city’s topography. MoEML traces London’s “spatial imaginary,” plotting “people, historical documents, literary works, and recent critical research onto topography and the built environment”.

All of these projects provide users with layers of engagement. Interfaces that are used for research often also invite the user to contribute to the development of the project through correction, annotation, enrichment of metadata, and curation of subcorpora. Through these additional activities, the “distant reading” that the projects allow users to engage in also facilitates close reading opportunities, enriching the project and the user’s work at the same time.