Text Mining Scarce Sources
April 27th, 2010 | Lincoln Mullen
I’d like to discuss text mining. I’m currently looking at narratives of conversion offered by fourteen laymen and -women to join a church in East Windsor, Connecticut, in 1700-02. This project raises questions about text mining that I haven’t seen addressed elsewhere. First, how can text mining can help scholars deal with the problem of scarce, rather than abundant, sources? Most projects that I’ve seen use text mining to plow through huge volumes of text. But there are only fourteen narratives from East Windsor (about 30 printed pages), and at most a couple hundred narratives from sixteenth-century New England. How can text mining provide close readings of the scarce documents that scholars from earlier eras work with? Second, how can text mining be adapted to documents that employ a vocabulary that is at once allusive and precise? Nearly every word in these dense narratives is a biblical or theological allusion, which is crucial to their meaning. At the same time, they use a very precise vocabulary. (For example, the term “saving faith” means “the type of faith that saves” rather than the more obvious “faith, which by necessity saves.”) How can text mining bring out the richness of this vocabulary? Though my project is focused on early American religious history, I think the questions it raises could contribute to the larger discussion of text mining.
Since I write for my own blog and for a group blog on the history of American religion, I’d also like to discuss the value and danger of blogging for graduate students and early-career scholars.
April 27th, 2010 at 7:31 pm
Check out www.historying.org for an example of text mining one single resource.
And what I learned about text mining from my (Great Lakes) THATCamp text mining session
mininghumanities.com/2010/04/24/a-text-miners-revalation-how-historians-use-text/
May 18th, 2010 at 8:52 pm
[…] hope that some of the abovementioned things connect to other THATcampers’ ideas, e.g. Lincoln Mullen’s post on mining scarce sources and Bill Ferster’s post on teaching using […]
May 20th, 2010 at 1:53 pm
I’ve done some thinking (with my colleague Doug Winiarski, who has written about these admission narratives) and a tiny bit of experimentation (using Mallet as well) on such “relations.” The direction I’ve thought about and just started to pursue is to see what textual features distinguish relations of women from those of men or relations written before from those written after the Great Awakening or relations written in one town from those written in another. Given how generic in terms of length, structure, diction, imagery, and biblical citation these texts can be, this seems to me an interesting exercise to see if text mining techniques can reveal what are likely to be extraordinarily subtle, even minuscule, differences that sex and place and time made.
May 22nd, 2010 at 7:44 am
Great—I was hoping someone would be more familiar with Winiarski’s work than I am. The Haverhill relations that Winiarski has written about and the East Windsor relations that I’m looking at differ in ways that should make for an interesting comparison in text mining methods.