Session 9 — Adding Value: Challenging Practical and Philosophical Assumptions in the Digitisation of Historical Sources Friday 14:00 - 15:30 High Tor 2 Chair: Michael Pidd
Re-Curating and Re-Imagining the Digitised Archive in the Classroom Adam Crymble University of Hertfordshire History students increasingly turn to the Internet to find textual primary sources to quote in their essays. Those digitised sources take many forms, with the digitiser deciding which attributes of the original were most important to capture in the virtual environment. Some decide to photograph the original; others opt for a transcription; others still choose to structure elements in the text. No matter the approach, students will tend to think of the digitised copy as good enough to use, because their use almost invariably is to quote what they read. Students rarely pause to consider how the decisions of digitisers affect the types of historical conclusions we can pursue. This paper reflects on a class assignment at the University of Hertfordshire that challenged history students to open up new research possibilities by re-curating and re-imagining that which had already been digitised. This assignment is an alternative to the traditional essay and instead asks students to view the digital archive as something that can be constantly revised. They reorganise, correct, categorise, link, and mark-up already-digitised records to ‘add value’ and build something that is more than both the original and the digitised copy. This paper discusses two iterations of this project in which students added value to the Alumni Oxonienses, 1500-1714 originally digitised by British History Online, and the Chelsea Hospital Examinations, 1800-1815, digitised by The National Archives. Students were given direction in what is possible in a digital environment, but what value they choose to add was entirely up to them, resulting in diverse and creative new datasets that open up new possibilities for historical research. The new datasets are remarkably high quality examples of scholarship far beyond what we tend to ask of undergraduates. Thus they become co-creators in our pursuit of historical knowledge, challenging the master-apprentice model of higher education.
Historical TEI: Developing a Portfolio of Common Practice Melodee Beals Loughborough University There is no standard, universally accepted practice for the digitisation of historical sources. Like all historical scholarship, there are no rules, only guidelines that we struggle and strain against, bending them to our individual wills. Even within the subdomain of textual sources, methods vary widely—constrained by financial and technological realities and the ways the digitser envisioned the resource being used. The creation of machine-readable texts, for example, has developed along several different pathways, including simple transcriptions (TXT, RTF, DOCX), tabular representations (CSV/TSV/XLSX), and structured (XML) or linked (RDF) datasets. From one perspective, these represent a continuum of detail—the bare textual content of Plain Text to the richly documented Resource Description Framework. Yet, the choice of a particular format reflects not only the digitisers' technical skill but the way in which they conceive the data, specifically the often implied hierarchies to which the data belongs. This paper will explore these hierarchies, at document and corpus level, and their repercussions on the digitisation process. In particular, it will explore the underlying ontological assumptions made by encoding models such as the Text Encoding Initiative (TEI) and the Dublin Core (DC) Metadata Initiative and how these correlate with the abstract and practical hierarchies used by historians in their archival work. Through this exploration, it will identify the core shared practices employed by historians in the development of machine-readable transcriptions and discuss the extent to which existing frameworks meet the general analytical needs of historians as researchers and teachers. It will conclude with recommendations for a transferable set of practices and vocabularies for the encoding of historical sources—one that will allow for both widespread comprehension and reuse as well as flexibility and specificity when working with varied genres.
London Lives Petitions Project: Remixing and Remaking Digital Histories Sharon Howard University of Sheffield Digitisation projects have created a wealth of online historical primary sources, resources which vary widely in scope and ambition, and in the sophistication and usability of their user interfaces. But even the most carefully designed is likely to frustrate the research needs of many historians who will use it. Every project is governed by material and conceptual considerations which inevitably shape and constrain the final resource. But primary sources are multivalent, capable of answering a wide range of questions depending on research priorities and methods. In this paper, I discuss the challenges and benefits of re-using already digitised primary source data, using the London Lives Petitions Project as my main case study. London Lives, 1690-1800 (www.londonlives.org) contains possibly the largest digitised collection of local petitions in existence (about 10,000, as it turns out), but they have been difficult to access and use because of the size and variety of the series of documents through which they are dispersed and the limitations of the existing data structure. My aim has been to extract those petitions from the London Lives XML data and create a text corpus with associated metadata which will enable in-depth research. I will examine how decisions made in the original digitisation project may help or hinder this task, and outline some of the key tools and techniques used in taking apart and reshaping an existing dataset to create a new resource.

Session 9 — Adding Value: Challenging Practical and Philosophical Assumptions in the Digitisation of Historical Sources

Friday 14:00 - 15:30

High Tor 2

Chair: Michael Pidd

Re-Curating and Re-Imagining the Digitised Archive in the Classroom

Adam Crymble

University of Hertfordshire

History students increasingly turn to the Internet to find textual primary sources to quote in their essays. Those digitised sources take many forms, with the digitiser deciding which attributes of the original were most important to capture in the virtual environment. Some decide to photograph the original; others opt for a transcription; others still choose to structure elements in the text. No matter the approach, students will tend to think of the digitised copy as good enough to use, because their use almost invariably is to quote what they read. Students rarely pause to consider how the decisions of digitisers affect the types of historical conclusions we can pursue.

This paper reflects on a class assignment at the University of Hertfordshire that challenged history students to open up new research possibilities by re-curating and re-imagining that which had already been digitised. This assignment is an alternative to the traditional essay and instead asks students to view the digital archive as something that can be constantly revised. They reorganise, correct, categorise, link, and mark-up already-digitised records to ‘add value’ and build something that is more than both the original and the digitised copy. This paper discusses two iterations of this project in which students added value to the Alumni Oxonienses, 1500-1714 originally digitised by British History Online, and the Chelsea Hospital Examinations, 1800-1815, digitised by The National Archives.

Students were given direction in what is possible in a digital environment, but what value they choose to add was entirely up to them, resulting in diverse and creative new datasets that open up new possibilities for historical research. The new datasets are remarkably high quality examples of scholarship far beyond what we tend to ask of undergraduates. Thus they become co-creators in our pursuit of historical knowledge, challenging the master-apprentice model of higher education.

Historical TEI: Developing a Portfolio of Common Practice

Melodee Beals

Loughborough University

There is no standard, universally accepted practice for the digitisation of historical sources. Like all historical scholarship, there are no rules, only guidelines that we struggle and strain against, bending them to our individual wills. Even within the subdomain of textual sources, methods vary widely—constrained by financial and technological realities and the ways the digitser envisioned the resource being used. The creation of machine-readable texts, for example, has developed along several different pathways, including simple transcriptions (TXT, RTF, DOCX), tabular representations (CSV/TSV/XLSX), and structured (XML) or linked (RDF) datasets. From one perspective, these represent a continuum of detail—the bare textual content of Plain Text to the richly documented Resource Description Framework. Yet, the choice of a particular format reflects not only the digitisers' technical skill but the way in which they conceive the data, specifically the often implied hierarchies to which the data belongs.

This paper will explore these hierarchies, at document and corpus level, and their repercussions on the digitisation process. In particular, it will explore the underlying ontological assumptions made by encoding models such as the Text Encoding Initiative (TEI) and the Dublin Core (DC) Metadata Initiative and how these correlate with the abstract and practical hierarchies used by historians in their archival work. Through this exploration, it will identify the core shared practices employed by historians in the development of machine-readable transcriptions and discuss the extent to which existing frameworks meet the general analytical needs of historians as researchers and teachers. It will conclude with recommendations for a transferable set of practices and vocabularies for the encoding of historical sources—one that will allow for both widespread comprehension and reuse as well as flexibility and specificity when working with varied genres.

London Lives Petitions Project: Remixing and Remaking Digital Histories

Sharon Howard

University of Sheffield

Digitisation projects have created a wealth of online historical primary sources, resources which vary widely in scope and ambition, and in the sophistication and usability of their user interfaces. But even the most carefully designed is likely to frustrate the research needs of many historians who will use it. Every project is governed by material and conceptual considerations which inevitably shape and constrain the final resource. But primary sources are multivalent, capable of answering a wide range of questions depending on research priorities and methods.

In this paper, I discuss the challenges and benefits of re-using already digitised primary source data, using the London Lives Petitions Project as my main case study. London Lives, 1690-1800 (www.londonlives.org) contains possibly the largest digitised collection of local petitions in existence (about 10,000, as it turns out), but they have been difficult to access and use because of the size and variety of the series of documents through which they are dispersed and the limitations of the existing data structure. My aim has been to extract those petitions from the London Lives XML data and create a text corpus with associated metadata which will enable in-depth research. I will examine how decisions made in the original digitisation project may help or hinder this task, and outline some of the key tools and techniques used in taking apart and reshaping an existing dataset to create a new resource.

DHC 2016 Click here to register