EP Lab

The EarlyPrint Lab offers a range of tools for the computational exploration and analysis of English print culture before 1700.

We apply these tools to a corpus of more than 60,000 early English printed documents, roughly 1.65 billion words. We intend the Lab as a provocation, not a finished toolkit. By exposing the corpus of early printed texts at scale we hope to defamiliarize familiar texts and invite exploration of unfamiliar ones.

The tools and visualizations collected here offer perspectives on the corpus that invite users to probe early English discursive history in ways that complement the search capabilities of EEBO-TCP and the Oxford English Dictionary.

N-gram browser

We begin with an N-gram browser that enables one to examine the changing frequencies of words and word forms over time. Like all the tools in the EPLab, the database also allows one to browse by original spelling, regularized spelling, or lemma (the dictionary heading form of a word), and to filter one’s queries by date and by using part-of-speech (POS) tags. The N-gram browser links to and pairs with the Linguistic Search tool.

Linguistic search -- phase I and II Linguistic search -- phase I

We offer a Linguistic Search tool which allows the researcher to search by individual word or by phrase. This tool allows search by original or regularized spelling, by lemma or by part-of-speech; searches can also be filtered by date, author, and title. The tool includes a query builder that will allow the curious researcher to construct searches that capture complex rhetorical patterns. We provide two instances of the Linguistic Search tool: the first instance queries a corpus of roughly 61,000 texts corresponding to EEBO-TCP Phases I and II; the second instance queries the smaller Phase I corpus. This smaller corpus is a more polished textual resource.

Phrase search

Phrase Search provides a simplified interface for researching phrases and short passages of text, by automatically creating search queries based on both a phrase's form and its content. It is designed as a starting point for generating complex queries in Linguistic Search.

Discovery engine

The Discovery Engine allows the researcher to select a text from the corpus and to find others that resemble it by one or another measure. Because ‘resemblance’ is a slippery concept, we implement several differently suggestive measures. If you like Text A, you might like . . .

      [placeholder for the catalog]

The Catalog offers information about individual texts in the corpus, including not only traditional catalog metadata (author, title, place of publication, etc.), but also formal metadata (word count, foreign language word count, paragraph count, line count, transcription error count). It also offers corpus-level views of document metadata. It enables the researcher to develop filtered groupings of the EarlyPrint corpus.

We also offer the first of what we expect to be a gallery of visualizations of the corpus.

TCP books per year

The first is a graph of TCP books per year, paired for reference with a background graph of the number of publications recorded in the English Short-Title Catalogue. It suggests changes in English print publication over time and of the relative size and comparability of the EEBO-TCP “sample.”

Word count per document over time

The second is a scatterplot showing word count per document over time, producing, among other patterns, a striking visualization of the sharp increase in pamphlet production in the 1640s. But caveat philologe: some of that sharp increase is an artifact of the history of the archive, for George Thomason’s assiduousness in collecting Civil War pamphlets somewhat distorts EEBO as a sample of printed discourse.

Download and examine catalog metadata

Finally, we offer a convenient interface for those seeking to download and examine the catalog metadata either for the entire corpus or for a subset thereof.

The EarlyPrint Lab was conceived by Anupam Basu during his tenure as Weil Fellow in Digital Humanities at Washington University; he aimed to render EEBO-TCP more tractable for quantitative historical analyses by virtue of some intensive reprocessing of the TCP texts and their metadata. Basu was assisted by John Ladd, Joseph Loewenstein, Douglas Knox, and Stephen Pentecost. The EarlyPrint Lab site is supported by the Humanities Digital Workshop at Washington University.