Text Mining Scarce Sources

April 27th, 2010 |

I’d like to discuss text mining. I’m currently looking at narratives of conversion offered by fourteen laymen and -women to join a church in East Windsor, Connecticut, in 1700-02. This project raises questions about text mining that I haven’t seen addressed elsewhere. First, how can text mining can help scholars deal with the problem of scarce, rather than abundant, sources? Most projects that I’ve seen use text mining to plow through huge volumes of text. But there are only fourteen narratives from East Windsor (about 30 printed pages), and at most a couple hundred narratives from sixteenth-century New England. How can text mining provide close readings of the scarce documents that scholars from earlier eras work with? Second, how can text mining be adapted to documents that employ a vocabulary that is at once allusive and precise? Nearly every word in these dense narratives is a biblical or theological allusion, which is crucial to their meaning. At the same time, they use a very precise vocabulary. (For example, the term “saving faith” means “the type of faith that saves” rather than the more obvious “faith, which by necessity saves.”) How can text mining bring out the richness of this vocabulary? Though my project is focused on early American religious history, I think the questions it raises could contribute to the larger discussion of text mining.

Since I write for my own blog and for a group blog on the history of American religion, I’d also like to discuss the value and danger of blogging for graduate students and early-career scholars.

Comments Feed

4 Responses to “Text Mining Scarce Sources”

  1. Aditi Muralidharan Says:

    Check out www.historying.org for an example of text mining one single resource.

    And what I learned about text mining from my (Great Lakes) THATCamp text mining session


  2. THATCamp 2010 » Blog Archive Says:

    […] hope that some of the abovementioned things connect to other THATcampers’ ideas, e.g. Lincoln Mullen’s post on mining scarce sources and Bill Ferster’s post on teaching using […]

  3. Rob Nelson Says:

    I’ve done some thinking (with my colleague Doug Winiarski, who has written about these admission narratives) and a tiny bit of experimentation (using Mallet as well) on such “relations.” The direction I’ve thought about and just started to pursue is to see what textual features distinguish relations of women from those of men or relations written before from those written after the Great Awakening or relations written in one town from those written in another. Given how generic in terms of length, structure, diction, imagery, and biblical citation these texts can be, this seems to me an interesting exercise to see if text mining techniques can reveal what are likely to be extraordinarily subtle, even minuscule, differences that sex and place and time made.

  4. Lincoln Mullen Says:

    Great—I was hoping someone would be more familiar with Winiarski’s work than I am. The Haverhill relations that Winiarski has written about and the East Windsor relations that I’m looking at differ in ways that should make for an interesting comparison in text mining methods.


  • Recent Comments

    THATCampers can use the blog and comments to talk about session ideas. Follow along by subscribing to the comments feed and to the blog feed!

    • thuyanh: A friend and I have actually made a video response that defends the “dumbest generation” and we...
    • Steven Hayes: Hi, just read your “project retrain” description as part of my background reading for...
    • Peter: Just curious: Is there a version of the National Register Nomination Form in some kind of database format,...
    • Samuel Teshale Derbe: This is excactly what I have been looking for.I have been recently invited to contribute to a...
    • plr articles: Just added more knowledge to my “library-head” :D
  • Twitter

    Here's what others are saying about THATCamp on Twitter

    • No items

    All Posts

  • THATCamp Prime Collaborative Documents
  • THATCamp Prime evaluation
  • New session: The THATCamp Movement
  • THATCamp on Flickr
  • Visualizing Subjectivity
  • More Twitter Visualizations
  • Remixing Academia
  • What THATCampers have been tweeting about (pre-camp)
  • Late to the Stage: Performing Queries
  • Humanist Readable Documentation
  • Zen Scavenger Hunt
  • The (in)adequacies of markup
  • One Week, One Book: Hacking the Academy
  • Analogizing the Sciences
  • Digital Literacy for the Dumbest Generation
  • Teaching Students Transferable Skills
  • Modest Proposals from a Digital Novice
  • Creative data visualizations
  • OpenStreetMap for Mapping of Historical Sites
  • soft circuits
  • Mostly Hack…
  • A Contextual Engagement
  • ARGs, Archives, and Digital Scholarship
  • Playing With the Past: Pick One of Three
  • DH centers as hackerspaces
  • All Courseware Sucks
  • HTML5
  • Dude, I Just Colleagued My Dean
  • The Future of Interdisciplinary Digital Cultural Heritage Curriculum (oh yeah, and games as well)
  • Project "Develop Self-Paced Open Access DH Curriculum for Mid-Career Scholars Otherwise Untrained"
  • what have you done for us lately?
  • Digital Storytelling: Balancing Content and Skill
  • Visualizing text: theory and practice
  • Plays Well With Others
  • Citing a geospatial hootenanny
  • Reimagining the National Register Nomination Form
  • documentation: what's in it for us?
  • Sharing the work
  • Digital Humanities Now 2.0 and New Models for Journals
  • Finding a Successor to Paper and Print
  • "Writing Space"
  • From Scratch
  • Cultivating Digital Skills and New Learning Spaces
  • Surveying the Digital Landscape Once Again
  • Building and designing projects for long term preservation
  • Collecting the Digital Story: Omeka and the New Media Narrative
  • Design Patterns for DH Projects
  • Chronicling America: They gave us an API. What do we do now?
  • Social Media and the History Non-Profit
  • THATCamp-in-a-Box
  • Teaching Collaboration
  • Geolocation, Archives, and Emulators (not all at once)
  • The Sound of Drafting
  • The Schlegel Blitz ("Only connect…")
  • Text Mining Scarce Sources
  • Applying open source methodology and economics to academia
  • What I'd Most Like to Do or Discuss
  • Hacking ethics for edupunks
  • Mobile technology and the humanities
  • Audiences and Arguments for Digital History
  • Open Peer Review
  • Who Wants To Be A Hacker?
  • Please advise
  • Greetings from the new Regional THATCamp Coordinator!
  • 2010 Applications Open!