Visualizing text: theory and practice

May 18th, 2010 |

Bad, bad me — of course I’ve been putting off writing up my ideas and thoughts for THATcamp almost to the latest possible moment. Waiting so long has one definitive advantage though: I get to point to some of the interesting suggestions that have already been posted here and (hopefully) add to them.

I’d like to both discuss and do text visualization. Charts, maps, infographics and other forms of visualization are becoming increasingly popular as we are faced with large quantities of textual data from a variety of sources. To linguists and literary scholars, visualizing texts can (among other things) be interesting to uncover things about language as such (corpus linguistics) and about individual texts and their authors (narratology, stylometrics, authorship attribution), while to a wide range of other disciplines the things that can be inferred from visualization (social change, spreading of cultural memes) beyond the text itself can be interesting.

What can we potentially visualize? This may seem to be a naive question, but I believe that only by trying out virtually everything we can think of (distribution of letters, words, word classes, n-grams, paragraphs, …; patterning of narrative strands, structure of dialog, occurrence of specific rhetorical devices; references to places, people, points in time…; emotive expressions, abstract verbs, dream sequences… you name it) can we reach conclusions about what (if anything!) these things might mean.

How can we visualize text? If we consider for a moment how we mostly visualize text today it quickly becomes apparent that there is much more we could be doing. Bar plots, line graphs and pie charts are largely instruments for quantification, yet very often quantitative relations between elements aren’t our only concern when studying text. Word clouds add plasticity, yet they eliminate the sequential patterning of a text and thus do not represent its rhetorical development from beginning to end. Trees and maps are interesting in this regard, but by and large we hardly utilize the full potential of visualization as a form of analysis, for example by using lines, shapes, color (!) and beyond that, movement (video) in a way that suits the kind of data we are dealing with.

What tools can we use to do visualization? I’m very interested in Processing and have played with it, also more extensively with R and NLTK/Python. Tools for rendering data, such as Google Chart Tools, igraph and RGraph are also interesting. Other, non-statistical tools are also an option: free hand drawing tools and web-based services like Many Eyes. Visualization doesn’t need to be restricted to computation/statistics. Stephanie Posavec‘s trees are a dynamic mix of automation and manual annotation and demonstrate that visualizations are rhetorically powerful interpretations themselves.

I hope that some of the abovementioned things connect to other THATcampers’ ideas, e.g. Lincoln Mullen’s post on mining scarce sources and Bill Ferster’s post on teaching using visualization.

Don’t get me started on the potential for teaching. Ultimately translating a text into another form is a unique kind of critical engagement: you’re uncovering, interpreting and making an argument all at once, both to the text in question and to yourself.

Anyway — anything from discussing theoretical issues of visualization to sharing code snippets would fit into this session and I’m looking forward to hearing other campers’ thoughts and experiences on the subject.

Comments Feed

5 Responses to “Visualizing text: theory and practice”

  1. Visualizing text: theory and practice Says:

    […] Note: I’ve also posted this on […]

  2. John Murray Says:

    Visualizations don’t seem to be living up to their potential, I agree.

    I initially considered using Processing as a framework for my own visualization project, but eventually decided on Adobe Flash for prototyping, as it’s already focused on interactive motion graphics. Of course, I’m pretty neutral about platforms — and am in fact more interested in how the development process can be thought through to take more advantage of the strengths of different languages or programs without the current significant barriers of knowledge or technology.

    How does one integrate both subjective and textual data together? I think this question, along with the others you mentioned, would be very interesting discussion topics.

  3. briancroxall Says:

    Thanks for this handy list of tools that I’ll have to look into. I would be very interested in this conversation and can certainly talk some about geospatial and temporal visualization.

  4. coffee001 Says:

    Thanks for the responses, guys!

    @John: there’s a growing number of tools out there, both toolkits (i.e. programming languages such as Flash, PHP, Python, R, Processing etc) and ready-made solutions that don’t require programming but are consequently more limited in their capabilities. When it comes to the type of visualizations we can currently do I feel like we have a chicken-egg problem: we need to combine an understanding of humanities data with a solid grasp of visualization (and, if we want to do quantitative stuff, statistics) to build meaningful visualizations.
    And yes, combining subjective tagging with structural characteristics is a challenge. Kinda ties in with Hugh Cayless proposal for markup:

    @Brian: I’d definitly like to hear more about geospatial and temporal vis.

  5. Code and brief instruction for graphing Twitter with R Says:

    […] a Twitter graph (=who you are following and who is following you) that I briefly showed at the session on visualizing text today at THATCamp and that I wanted to share. My comments in the code are very basic and there is […]


  • Recent Comments

    THATCampers can use the blog and comments to talk about session ideas. Follow along by subscribing to the comments feed and to the blog feed!

    • thuyanh: A friend and I have actually made a video response that defends the “dumbest generation” and we...
    • Steven Hayes: Hi, just read your “project retrain” description as part of my background reading for...
    • Peter: Just curious: Is there a version of the National Register Nomination Form in some kind of database format,...
    • Samuel Teshale Derbe: This is excactly what I have been looking for.I have been recently invited to contribute to a...
    • plr articles: Just added more knowledge to my “library-head” :D
  • Twitter

    Here's what others are saying about THATCamp on Twitter

    • No items

    All Posts

  • THATCamp Prime Collaborative Documents
  • THATCamp Prime evaluation
  • New session: The THATCamp Movement
  • THATCamp on Flickr
  • Visualizing Subjectivity
  • More Twitter Visualizations
  • Remixing Academia
  • What THATCampers have been tweeting about (pre-camp)
  • Late to the Stage: Performing Queries
  • Humanist Readable Documentation
  • Zen Scavenger Hunt
  • The (in)adequacies of markup
  • One Week, One Book: Hacking the Academy
  • Analogizing the Sciences
  • Digital Literacy for the Dumbest Generation
  • Teaching Students Transferable Skills
  • Modest Proposals from a Digital Novice
  • Creative data visualizations
  • OpenStreetMap for Mapping of Historical Sites
  • soft circuits
  • Mostly Hack…
  • A Contextual Engagement
  • ARGs, Archives, and Digital Scholarship
  • Playing With the Past: Pick One of Three
  • DH centers as hackerspaces
  • All Courseware Sucks
  • HTML5
  • Dude, I Just Colleagued My Dean
  • The Future of Interdisciplinary Digital Cultural Heritage Curriculum (oh yeah, and games as well)
  • Project "Develop Self-Paced Open Access DH Curriculum for Mid-Career Scholars Otherwise Untrained"
  • what have you done for us lately?
  • Digital Storytelling: Balancing Content and Skill
  • Visualizing text: theory and practice
  • Plays Well With Others
  • Citing a geospatial hootenanny
  • Reimagining the National Register Nomination Form
  • documentation: what's in it for us?
  • Sharing the work
  • Digital Humanities Now 2.0 and New Models for Journals
  • Finding a Successor to Paper and Print
  • "Writing Space"
  • From Scratch
  • Cultivating Digital Skills and New Learning Spaces
  • Surveying the Digital Landscape Once Again
  • Building and designing projects for long term preservation
  • Collecting the Digital Story: Omeka and the New Media Narrative
  • Design Patterns for DH Projects
  • Chronicling America: They gave us an API. What do we do now?
  • Social Media and the History Non-Profit
  • THATCamp-in-a-Box
  • Teaching Collaboration
  • Geolocation, Archives, and Emulators (not all at once)
  • The Sound of Drafting
  • The Schlegel Blitz ("Only connect…")
  • Text Mining Scarce Sources
  • Applying open source methodology and economics to academia
  • What I'd Most Like to Do or Discuss
  • Hacking ethics for edupunks
  • Mobile technology and the humanities
  • Audiences and Arguments for Digital History
  • Open Peer Review
  • Who Wants To Be A Hacker?
  • Please advise
  • Greetings from the new Regional THATCamp Coordinator!
  • 2010 Applications Open!