EEBO and EEBO-TCP: A Brief Introduction Joseph Loewenstein and Alireza Taheri Araghi EEBO Early English Books Online (EEBO) is one of the great resources for the study of early modern British culture. It aims to provide digital images of one copy each of the surviving books and broadsides printed in the British Isles and British North America between 1473 and 1700 and of the English language books printed in Europe during the same period. EEBO began in the 1930s as a microfilm project when the young publisher Eugene B. Power founded University Microfilms (now ProQuest) and began photographing and microfilming English books published before 1701. Over twenty years ago, ProQuest started to digitize and make available its microfilms of early print to create the database that we now know as EEBO. Contents EEBO is a composite. Its scans reproduce four microfilm collections produced by University Microfilms: Early English Books I, 1475-1640. Early English Books II, 1641-1700. While EEBO’s digital collection reproduces most of the microfilms of Early English Books I (93%) and virtually all of Early English Books II, these two microfilm collections are themselves incomplete. The two microfilm collections were meant to represent the volumes catalogued, respectively, in A. W. Pollard and G. R. Redgrave’s Short-title catalogue of books printed in England, Scotland and Ireland, and of English books printed abroad, 1475-1640 (STC I, first published in 1927 with a second edition released between 1976 and 1991, 26,500 titles) and Donald G. Wing’s Short-title catalogue of books printed in England, Scotland, Ireland, Wales, and British America, and of English books printed in other countries, 1641-1700 (STC II, compiled by Donald Wing between 1945 and 1951, with a revised edition having appeared between 1972 and 1998, 90,000 titles) but the microfilming is ongoing, with conclusion projected to require a few more years. The two STCs catalogue works printed in the British Isles, its colonies, or elsewhere in the world provided they were in English or another British language. EEBO reproduces virtually all of two other microfilm collections: Thomason Tracts (1640-1661). A collection of pamphlets, broadsides, books, and other types of writing mostly printed in London from 1640 to 1661. Curated by George Thomason, the 22,000 items in the collection represent about 80 percent of what was published in England in the period. Early English Books Tract Supplement. This collection comprises 16th- and 17th-century broadsides and pamphlets collected as “scrapbooks” or tract volumes categorized by such criteria as dates or topics. Mainly from the British Library, the tract volumes make it possible for the reader “to see the materials in the same order as they would when leafing through the original volume.” (About EEBO) Gaps and Asymmetries To say that EEBO fully represents the British print culture would be misleading. EEBO does not contain every book printed in English within the time-frame of the collection. The two STCs catalogue only surviving works, and many printed books no longer survive. Since they list only surviving works, these catalogues may distort the aggregate profile of actual printed books. Moreover, EEBO does not include all the extant copies of an edition and, as the Early Modern English scholars are aware, different copies of the same edition may significantly vary from one to another. Wing’s catalogue leaves out periodicals, which STC includes. Also because the STCs exclude non-English materials printed abroad, large numbers of Latin books imported into England are not part of the catalogue. (That said, the database is not monolingual either as it “covers more than 30 languages from Algonquin to Welsh.”) Ian Gadd has written wisely on The Use and Misuse of Early English Books Online, as has Diana Kichuk in “Metamorphosis: Remediation in Early English Books Online (EEBO),” Literary and Linguistic Computing 22, no. 3 (September 1, 2007): 291–303. Kichuk and Gadd make it clear why the EEBO should be regarded as a very, very large sample of the output of the early modern press. Some EEBO Numbers More than 132,000 titles. More than 17 million pages (currently filming 100,000 more). More than 30 languages: 9496 records in Latin 742 records in Romance (other) 619 records in French 309 records in Ancient Greek 282 records in Modern Greek 204 records in Welsh 72 records in Dutch 146 records in Middle French 138 records in Italian 111 records in Hebrew 88 records in German 82 records in Scots 60 records in Spanish 36 records in Arabic 19 records in Gaelic (Irish) 13 records in Algonquin 10 records in Aramaic 10 records in North American Indian (other) 9 records in Gaelic (Scots) 8 records in Persian 7 records in Portuguese 6 records in Syriac 5 records in Newari 3 records in Old French 2 records in Pahlavi 2 records in Polish 2 records in Turkish 1 record in Chinese 1 record in Ethiopic 1 record in Lithuanian 1 record in Malay EEBO-TCP EEBO is a commercial product, available for institutional purchase or license. A number of subscribing institutions funded the transcription project by a further subscription to the Text Creation Partnership (EEBO-TCP). The TCP was conceived in 1999 between the University of Michigan Library, Bodleian Libraries at the University of Oxford, ProQuest, and the Council on Library and Information Resources; it aimed to create standardized, machine-readable, searchable texts of early English print, by transcribing and marking up the text from the millions of page images in ProQuest’s Early English Books Online. The first phase of TCP transcription began in 1999 and it completed its target of 25,000 transcriptions by 2009; the second phase of transcription began in 2009 and was roughly half complete by March of 2014. On January 1, 2015, the transcriptions of TCP Phase I will be made freely available; in July of 2020, the Phase II transcriptions were made similarly available. The Phase I EEBO transcriptions were the first of the TCP undertakings. In 2005, satisfied that they had proved the viability of the model, the board and staff of the Text Creation Partnership decided to approach the publishers of two other major scholarly databases, Eighteenth Century Collections Online (ECCO) and Evans Early American Imprints. Of the 150,000 titles in the ECCO, over 2200 have been transcribed, and these transcriptions are freely available. 6000 titles of the Evans database which comprises 40,000 titles, roughly two-thirds of the books, pamphlets, and broadsides printed between 1640 and 1800 in the territory that eventually became the United States. The Evans TCP were made freely available in June 2014. Machine-readable and therefore Scholar-friendly For researchers with access to it, Proquest’s EEBO-TCP substantially extends the utility of EEBO, for the TCP texts may be easily quoted and variously searched. The TCP texts have been tagged, so that many of the component parts of the transcribed texts are distinguishable by form (verse, prose, drama) and function (dedication, table of contents). EEBO search tools were designed to mitigate the difficulties of working with early modern spelling irregularities. Simple, Boolean (with AND, OR, NOT operators), proximity (search terms with certain distance of one another), and citation search tools are available. The results of keyword searches are displayed in context andit is reasonably easy to move between the KWIC results of a search to the relevant page image. The TCP site maintains an equally instructive and freely accessible search interface, although it does not provide access to relevant page-images. Caveat Explorator If EEBO distorts the total print output, by its imitation to surviving works and its slight under-representation of reprints, the TCP is even more selective. The TCP transcriptions represent only a portion – to date, roughly 40% – of the (still incomplete) microfilm collections represented, as scans, in EEBO. Although the TCP is in many ways comprehensive, providing a transcription of one edition of nearly every surviving work produced in the first 227 years of English print, it is not a random sample of early print. The microfilm collections from which the transcriptions have been taken include images from single copies only. When a copy is somehow deficient, because of missing pages or poor inking, the protocols for TCP transcription require that the deficiencies go unremedied. And single copies are almost inevitably deficient in another sense: since early modern printing practices allow for the correction of apparent transmissional errors without the destruction of “misprinted” sheets, single copies cannot represent all the states that make up the highly variable flow of printed output.