Text Mining Scarce Sources
Tuesday, April 27th, 2010I’d like to discuss text mining. I’m currently looking at narratives of conversion offered by fourteen laymen and -women to join a church in East Windsor, Connecticut, in 1700-02. This project raises questions about text mining that I haven’t seen addressed elsewhere. First, how can text mining can help scholars deal with the problem of scarce, rather than abundant, sources? Most projects that I’ve seen use text mining to plow through huge volumes of text. But there are only fourteen narratives from East Windsor (about 30 printed pages), and at most a couple hundred narratives from sixteenth-century New England. How can text mining provide close readings of the scarce documents that scholars from earlier eras work with? Second, how can text mining be adapted to documents that employ a vocabulary that is at once allusive and precise? Nearly every word in these dense narratives is a biblical or theological allusion, which is crucial to their meaning. At the same time, they use a very precise vocabulary. (For example, the term “saving faith” means “the type of faith that saves” rather than the more obvious “faith, which by necessity saves.”) How can text mining bring out the richness of this vocabulary? Though my project is focused on early American religious history, I think the questions it raises could contribute to the larger discussion of text mining.
Since I write for my own blog and for a group blog on the history of American religion, I’d also like to discuss the value and danger of blogging for graduate students and early-career scholars.