In this experimental scatter plot, each point represents a distinct EarlyPrint text. The position of the points is based on LDA topic models (created using MALLET) that have been visualized using LargeVis. LargeVis attempts to position points based on their similarity to one another, organizing them into clusters.

The topic modeling data used here is the same that drives our Discovery Engine. Importantly, LargeVis is designed to show clusters in the data, but there may be additional ways of clustering or sorting the data that LargeVis is not showing. The topic models may be good at grouping together certain genres or themes, but not good at showing other known categories of text. This scatter plot is meant as an illustration of certain clusters in the corpus, but it is not a final authority on all possible text clusters.

You can use the menu on the left to highlight texts that contain specific terms from their Library of Congress subject headings. Note that there is a lot of overlap between the subject headings and the topic modeling clusters. This suggests that human catalogers and topic modeling are turning up some of the same general categories. If you choose groups of related subject terms (e.g. "Bible," "Sermons," and "Church of England") you'll see that there are large communities and "islands" for common categories of texts in the corpus. There are islands of political texts, religious texts, poetry, drama, and more. We encourage you to use this as an exploratory tool, a first step toward visualizing topics in the EarlyPrint corpus.

Designed and implemented by John Ladd and Steve Pentecost.

n.b. The colors used in this graph are colorblind safe, but there are too many distinct terms for the number of possible colors. Colors will repeat.

Topic Model Scatter Plot What is this?
Click a point to get more information, or use the search box above.

Scroll to zoom. Click and drag to pan.