White Paper: Combining Text-Mining and Geo-Visualization

By Andrew J. Torget and Rada Mihalcea, University of North Texas,
Jon Christensen and Geoff McGhee, Stanford University

In September 2010, the University of North Texas (in partnership with Stanford University) was awarded a National Endowment for the Humanities Level II Digital Humanities Start-Up Grant (Award #HD-51188-10) to develop a series of experimental models for combining the possibilities of text-mining with geospatial mapping in order to unlock the research potential of large-scale collections of historical newspapers. Using a sample of approximately 230,000 pages of historical newspapers from the Chronicling America digital newspaper database, we developed two interactive visualizations of the language content of these massive collections of historical documents as they spread across both time and space: one measuring the quantity and quality of the digitized content, and a second measuring several of the most widely used large-scale language pattern metrics common in natural language processing work. This white paper documents those experiments and their outcomes, as well as our recommendations for future work.

Download the full report:
Mapping Texts: Combining Text-Mining and Geo-Visualization to Unlock the Research Potential of Historical Newspapers Download PDF File


Topic Modeling on Historical Newspapers

By Tze-I Yang, Andrew J. Torget, and Rada Mihalcea, University of North Texas

In this paper, we explore the task of automatic text processing applied to collections of historical newspapers, with the aim of assisting historical research. In particular, in this first stage of our project, we experiment with the use of topical models as a means to identify potential issues of interest for historians.
Tze-I Yang, Andrew J. Torget, Rada Mihalcea, “Topic Modeling on Historical Newspapers,” proceedings of the Association for Computational Linguistics workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (ACL LATECH 2011), June 2011, pp. 96-104

Download the paper:
Topic Modeling on Historical Newspapers Download PDF File

 

Comments are closed.