Session 11 Friday 14:00 - 15:30 High Tor 4 Chair: Katherine Rogers
A Question of Style: individual voices and corporate identity in the Edinburgh Review, 1814-1820 Francesca Benatti , David King Open University Keywords: literary studies, computational linguistics, OCR correction This poster (size A0) presents our project, A Question of Style: individual voices and corporate identity in the Edinburgh Review, 1814-1820, which is funded by a Research Society for Victorian Periodicals Field Development Grant running until October 2017. We want to assess the assumption that early nineteenth-century periodicals succeeded in creating, through a “transauthorial discourse”, a unified corporate voice that hid individual authors behind an impersonal public text (Klancher 1987). We are creating a sample corpus of approximately 500,000 words comprising 325,000 words from the Edinburgh Review and 175,000 from its competitor, the Quarterly Review, for a total of about 80 articles. To assist our OCR correction, metadata creation and textual markup, we are developing a suite of Python scripts, based on our previous work with post-OCR correction (King 2013) and semi-automated TEI markup (Willis et al 2010). We employ methods from periodical studies, book history, computational linguistics and computational stylistics to “operationalise” our definition of style in order to select features that can be measured empirically, transforming concepts into a set of operations (Moretti 2013). We will focus on features at the level of words and sentences such as: vocabulary richness, length of articles, length of sentences, length of quotations from text under review, distribution of parts of speech, distinctive vocabulary of each journal, distinctive vocabulary of each author, distinctive vocabulary in each type of review (literature, travel, politics etc.), using methods such as term frequency: inverse document frequency, Burrows’ Delta and Zeta methods, Moretti’s Most Distinctive Words Method, and Principal Component Analysis. Finally, we will qualitatively describe the results of this stylistic analysis and evaluate them within the context of both literary scholarship on nineteenth-century periodicals and computational linguistics scholarship, using our literary and historical interpretation to generate critical knowledge out of our measurements. [297 words] Works Cited King, David. “Digging in the library.” Invited lecture presented at Biodiversity Informatics Horizons 2013, Rome. September 2013 Klancher, Jon P. The Making of English Reading Audiences, 1790-1832. University of Wisconsin Press, 1987. Moretti, Franco. “Operationalizing”: or, the function of measurement in modern literary theory” Stanford Literary Lab. Pamphlet 6. Stanford Lit. Lab, December 2013. Willis, Alistair, David King, David Morse, Anton Dil, Chris Lyal, and Dave Roberts. “From XML to XML: The why and how of making the biodiversity literature accessible to researchers.” Language Resources and Evaluation Conference (LREC), Valletta. May 2010.
Distorted Projections: Spatial Imaginaries and Desired Trajectories in Christina Stead’s For Love Alone Anouk Lang University of Edinburgh “In the part of the world Teresa came from, winter is in July, spring brides marry in September, and Christmas is consummated with roast beef, suckling pig, and brandy-laced plum pudding at 100 degrees in the shade, near the tall pine-tree loaded with gifts and tinsel as in the old country, and old carols have rung out all through the night.” From its opening lines, Christina Stead’s 1945 novel For Love Alone establishes a sense of being “out of place”, signalling to readers that the geography into which they are about to be immersed is distorted and unstable. As the narrative unfolds, the coming of age of the protagonist Teresa is marked by her longing to escape from parochial, provincial Sydney to the great metropolis of culture, London. But this is a trajectory whose fulfilment proves very different to its imagined anticipation, and it serves as a fictional rendition of the spatial and cultural displacement felt by many Australian writers of the twentieth century caught between the cultural authority of English publishers and literary standards, and the imperative to contribute to the project of building a national literature that could emerge out of the shadow of its European and English progenitors. In this paper, I take the idea of distorted projection, and use it to think about the spatial imaginaries at work in Stead’s text, which GIS software makes it possible to grasp in visual form, rather than being restricted to a linear format through the progression of a narrative. I consider both the value and limitations of the process of automated geoparsing (using the Edinburgh Geoparser), and suggest additional coding categories to better account for the different ways that place is invoked, and the effects of these spatial references. Miriam Posner has recently charged the digital humanities with using technologies that commonly “enshrin[e] Western European, Cartesian models of space” (http://miriamposner.com/blog/money-and-time/). Rising to her challenge that postcolonially-inflected digital humanities work should interrogate rather than reinforce the hegemony of projections that reinforce a view of the world from a dominant perspective, I use GIS tools to grasp Teresa’s relationship to the physical geography of Sydney and the spatial imaginary of London, and thereby to make a case that GIS mapping can help to show how the text is carrying out an analogical act of distorted projection.
Chinese Text Project: A Digital Library of Pre-Modern Chinese Literature Donald Sturgeon Harvard University Since its creation in 2005 as an online search tool for a handful of classical Chinese texts, the Chinese Text Project (http://ctext.org) has gradually grown to become the largest and most widely used digital library of pre-modern Chinese texts, as well as a platform for exploring the application of new digital methods to the study of pre-modern Chinese literature. This paper discusses how several unique aspects of the project have contributed to its success. Firstly it demonstrates how simplifying assumptions holding for domain-specific OCR (Optical Character Recognition) of historical works have made possible reductions in complexity of the task and thus led to increased recognition accuracy. Secondly it shows how crowd-sourced proofreading and editing using a publicly accessible version-controlled wiki system has made it possible to leverage a large and distributed audience and user base, including many volunteers located outside of traditional academia, to improve the quality of digital content and enable the creation of accurate transcriptions of previously untranscribed texts and editions. Finally, it explores how the implementation of open APIs (Application Programming Interfaces) has greatly expanded the utility of the library as a whole, facilitating open and decentralized integration with other projects, as well as leading to entirely new applications in digital humanities teaching and research.

Session 11

Friday 14:00 - 15:30

High Tor 4

Chair: Katherine Rogers

A Question of Style: individual voices and corporate identity in the Edinburgh Review, 1814-1820

Francesca Benatti ,
David King

Open University

Keywords: literary studies, computational linguistics, OCR correction

This poster (size A0) presents our project, A Question of Style: individual voices and corporate identity in the Edinburgh Review, 1814-1820, which is funded by a Research Society for Victorian Periodicals Field Development Grant running until October 2017.

We want to assess the assumption that early nineteenth-century periodicals succeeded in creating, through a “transauthorial discourse”, a unified corporate voice that hid individual authors behind an impersonal public text (Klancher 1987).

We are creating a sample corpus of approximately 500,000 words comprising 325,000 words from the Edinburgh Review and 175,000 from its competitor, the Quarterly Review, for a total of about 80 articles. To assist our OCR correction, metadata creation and textual markup, we are developing a suite of Python scripts, based on our previous work with post-OCR correction (King 2013) and semi-automated TEI markup (Willis et al 2010).

We employ methods from periodical studies, book history, computational linguistics and computational stylistics to “operationalise” our definition of style in order to select features that can be measured empirically, transforming concepts into a set of operations (Moretti 2013). We will focus on features at the level of words and sentences such as: vocabulary richness, length of articles, length of sentences, length of quotations from text under review, distribution of parts of speech, distinctive vocabulary of each journal, distinctive vocabulary of each author, distinctive vocabulary in each type of review (literature, travel, politics etc.), using methods such as term frequency: inverse document frequency, Burrows’ Delta and Zeta methods, Moretti’s Most Distinctive Words Method, and Principal Component Analysis.

Finally, we will qualitatively describe the results of this stylistic analysis and evaluate them within the context of both literary scholarship on nineteenth-century periodicals and computational linguistics scholarship, using our literary and historical interpretation to generate critical knowledge out of our measurements. [297 words]

Works Cited

King, David. “Digging in the library.” Invited lecture presented at Biodiversity Informatics Horizons 2013, Rome. September 2013
Klancher, Jon P. The Making of English Reading Audiences, 1790-1832. University of Wisconsin Press, 1987.

Moretti, Franco. “Operationalizing”: or, the function of measurement in modern literary theory” Stanford Literary Lab. Pamphlet 6. Stanford Lit. Lab, December 2013.

Willis, Alistair, David King, David Morse, Anton Dil, Chris Lyal, and Dave Roberts. “From XML to XML: The why and how of making the biodiversity literature accessible to researchers.” Language Resources and Evaluation Conference (LREC), Valletta. May 2010.

Distorted Projections: Spatial Imaginaries and Desired Trajectories in Christina Stead’s For Love Alone

Anouk Lang

University of Edinburgh

“In the part of the world Teresa came from, winter is in July, spring brides marry in September, and Christmas is consummated with roast beef, suckling pig, and brandy-laced plum pudding at 100 degrees in the shade, near the tall pine-tree loaded with gifts and tinsel as in the old country, and old carols have rung out all through the night.”

From its opening lines, Christina Stead’s 1945 novel For Love Alone establishes a sense of being “out of place”, signalling to readers that the geography into which they are about to be immersed is distorted and unstable. As the narrative unfolds, the coming of age of the protagonist Teresa is marked by her longing to escape from parochial, provincial Sydney to the great metropolis of culture, London. But this is a trajectory whose fulfilment proves very different to its imagined anticipation, and it serves as a fictional rendition of the spatial and cultural displacement felt by many Australian writers of the twentieth century caught between the cultural authority of English publishers and literary standards, and the imperative to contribute to the project of building a national literature that could emerge out of the shadow of its European and English progenitors.

In this paper, I take the idea of distorted projection, and use it to think about the spatial imaginaries at work in Stead’s text, which GIS software makes it possible to grasp in visual form, rather than being restricted to a linear format through the progression of a narrative. I consider both the value and limitations of the process of automated geoparsing (using the Edinburgh Geoparser), and suggest additional coding categories to better account for the different ways that place is invoked, and the effects of these spatial references. Miriam Posner has recently charged the digital humanities with using technologies that commonly “enshrin[e] Western European, Cartesian models of space” (http://miriamposner.com/blog/money-and-time/). Rising to her challenge that postcolonially-inflected digital humanities work should interrogate rather than reinforce the hegemony of projections that reinforce a view of the world from a dominant perspective, I use GIS tools to grasp Teresa’s relationship to the physical geography of Sydney and the spatial imaginary of London, and thereby to make a case that GIS mapping can help to show how the text is carrying out an analogical act of distorted projection.

Chinese Text Project: A Digital Library of Pre-Modern Chinese Literature

Donald Sturgeon

Harvard University

Since its creation in 2005 as an online search tool for a handful of classical Chinese texts, the Chinese Text Project (http://ctext.org) has gradually grown to become the largest and most widely used digital library of pre-modern Chinese texts, as well as a platform for exploring the application of new digital methods to the study of pre-modern Chinese literature. This paper discusses how several unique aspects of the project have contributed to its success. Firstly it demonstrates how simplifying assumptions holding for domain-specific OCR (Optical Character Recognition) of historical works have made possible reductions in complexity of the task and thus led to increased recognition accuracy. Secondly it shows how crowd-sourced proofreading and editing using a publicly accessible version-controlled wiki system has made it possible to leverage a large and distributed audience and user base, including many volunteers located outside of traditional academia, to improve the quality of digital content and enable the creation of accurate transcriptions of previously untranscribed texts and editions. Finally, it explores how the implementation of open APIs (Application Programming Interfaces) has greatly expanded the utility of the library as a whole, facilitating open and decentralized integration with other projects, as well as leading to entirely new applications in digital humanities teaching and research.

DHC 2016 Click here to register