Digitizing 227 Years of English Print

EEBO: Early Modern English Scanned

Early English Books Online (EEBO) is a resource for studying early English print as well as early modern British culture.The EEBO database provides “images of virtually every work printed in England, Ireland, Scotland, Wales and British North America and works in English printed elsewhere from 1473–1700.” (About EEBO) EEBO began in the 1930s as a microfilm project when the young publisher Eugene B. Power founded University Microfilms (now ProQuest) and began photographing and microfilming English books published before 1701. A similar project by a rival company was carried out on printed materials of the eighteenth century based on “The Eighteenth Century Short Title Catalogue.” Over twenty years ago, ProQuest started to digitize and make available its microfilms of early print to create the database that we now know as EEBO.

Although the black-and-white microfilms that form the basis of EEBO go back four score years and the images are not without imperfections, since its launch the database has remarkably facilitated access to the early modern printed English texts.

The EEBO database reproduces titles from four collections:

  1. Pollard & Redgrave’s Short-Title Catalogue (1475-1640). First published in 1927, with a second edition released between 1976 and 1991, STC catalogues works printed in the British Isles, its colonies, or elsewhere in the world provided they were in English or another British language.
  2. Wing’s Short-Title Catalogue (1641-1700). This catalogue was compiled by Donald Wing between 1945 and 1951. A revised edition appeared between 1972 and 1998.
  3. Thomason Tracts (1640-1661). A collection of pamphlets, broadsides, books, and other types of writing mostly printed in London from 1640 to 1661. Curated by George Thomason, the 22,000 items in the collection represent about 80 percent of what was published in England in the period.
  4. Early English Books Tract Supplement. This collection comprises 16th- and 17th-century broadsides and pamphlets collected as “scrapbooks” or tract volumes categorized by such criteria as dates or topics. Mainly from the British Library, the tract volumes make it possible for the reader “to see the materials in the same order as they would when leafing through the original volume.” (About EEBO)

Gaps and Asymmetries

To say that EEBO fully represents the British print culture would be misleading, if not outright wrong. EEBO does not contain every book printed in English within the time-frame of the collection. Books might have gone missing before the microfilming project began. EEBO does not include all the extant copies of an edition and, as the Early Modern English scholars are aware, different copies of the same edition may significantly vary from one to another. Wing’s catalogue leaves out periodicals which STC includes and does not offer as comprehensive metadata as STC does. Also because STC excludes non-English materials, large numbers of Latin books imported into England from the fifteenth century on are not part of the catalogue. That said, the database is not monolingual either as it “covers more than 30 languages from Algonquin to Welsh.” (There are a few books in Latin and Persian with minimal or no English in them.)

Read more:

The Use and Misuse of Early English Books Online,” by Ian Gadd.

History of Early English Books Online

Some EEBO Numbers

More than 132,000 titles.
More than 17 million scanned pages.
Currently scanning 100,000 more pages.
More than 30 languages:
9496 records in Latin
742 records in Romance (other)
619 records in French
309 records in Ancient Greek
282 records in Modern Greek
204 records in Welsh
172 records in Dutch
146 records in Middle French
138 records in Italian
111 records in Hebrew
88 records in German
82 records in Scots
60 records in Spanish
36 records in Arabic
19 records in Gaelic (Irish)
13 records in Algonquin
10 records in Aramaic
10 records in North American Indian (other)
9 records in Gaelic (Scots)
8 records in Persian
7 records in Portuguese
6 records in Syriac
5 records in Newari
3 records in Old French
2 records in Pahlavi
2 records in Polish
2 records in Turkish
1 records in Chinese
1 records in Ethiopic
1 records in Lithuanian
1 records in Malay

What is TCP?: Early Modern, Machine-friendly

The Text Creation Partnership (TCP) is an academic endeavor aiming to create standardized, machine-readable, searchable texts of early English print. The partnership aims to “transcribe and mark up the text from the millions of page images in ProQuest’s Early English Books Online, Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints.” (About TCP)

Because they are machine-readable and tagged, TCP texts make quoting and searching of the EEBO materials considerably easier. The component parts of TCP texts are distinguishable by form (drama, prose, verse) and function (TOC, dedication). The TCP search tools have been designed to facilitate working with early modern spelling irregularities. Simple, Boolean (with AND, OR, NOT operators), proximity (search terms with certain distance of one another), and citation search tools are available.

ٍEligibility for digitization and encoding of a title depends on whether the author’s name appears in the New Cambridge Bibliography of English Literature (NCBEL). If anonymous, a work may still be selected if the title appears in the NCBEL bibliography. Because NCBEL contains both canonical and less lauded titles, its selection as guideline would result in a more variegated collection.

The Text Creation Partnership is funded by more than 150 libraries that own the outcome of the work. All of the TCP’s production, however, will ultimately be in the public domain.

Read more: “EEBO and EEBO-TCP: A Brief Introduction,” by Joseph Loewenstein.

EEBO-TCP Partnership

The collaboration between, University of Michigan, Oxford University, and ProQuest began in 1999 with the aim of of creating TEI-compliant SGML/XML texts from 25,000 of EEBO books. The project came in two phases.

Text Encoding Initiative (TEI) is a set of open-source guidelines for “encoding machine-readable texts in the humanities and social sciences” to represent structural and conceptual features of texts. TEI guidelines are developed and maintained by the TEI Consortium, an international organization founded in 1987.

Read more on TEI: “Text Encoding Initiative

Standard Generalized Markup Language (SGML) is a standard for how to define a markup language or tagset. (HTML is an example of a markup language). SGML itself is not a document language, but a description of how to specify one.

Read more on SGML: “SGML (Standard Generalized Markup Language)
Standard Generalized Markup Language

Extensible Markup Language (XML) “is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.” Like HTML, XML contains markup language to describe the formal structure of a page or a file.

Read more on XML: “XML
XML (Extensible Markup Language)

EEBO-TCP Phase 1: The first phase of the EEBO-TCP work ran from 2000 to 2009 and resulted in the successful conversion of 25,363 selected texts from the EEBO corpus. “Since January 2015 these EEBO-TCP Phase I texts became freely available on the websites of the University of Michigan Library and the Bodleian Libraries at the University of Oxford.” (About EEBO)

EEBO-TCP Phase 2: EEBO-TCP Phase 2 begun in 2010 seeking “to convert each unique first edition in EEBO: around 45,000 books on top of the 25,000 completed in Phase I.” (About TCP-EEBO) Starting July 1, 2015, ProQuest will have the rights to distribute EEBO-TCP Phase II texts for five years. Until the texts pass into the public domain in 2020, they are accessible to subscribing users and partner libraries.