Session 8

Friday 09:30 - 11:00

High Tor 3

Chair: Seth Mehl

The Spatial Poetics of Artefacts

  • Kate Simpson

University of Sheffield

In this paper I will explore how artefacts and texts used together can evidence the unarticulated narratives of African women in the expeditions of David Livingstone (1813-1873). I will show how reading available manuscripts, documents and artefacts as palimpsests, seeing them as variable renderings of a specific location, makes it possible to present alternate narratives of exploration. Using two items acquired in 1859 by David Livingstone, specifically a lip ring from a Mang'anja woman and a bracelet from Kafue in present day Zambia, I will show how using digital humanities techniques and mapping tools can illuminate women whose narratives have been obfuscated from the historical record.

Little is known about these items. Did women remove them to give or sell? Were they taken without consent? Were they made specifically as souvenirs to sell to passing trade? For years they have sat in a museum display case in Scotland, relics of a different space. Part of the weight of material taken back to the UK and re-made as specimens and artefacts that show the supposed alterity of Africa and its people.

It is in engaging with these items digitally, against the grain of the traditional repository, that it becomes possible to uncover the appropriation of local knowledge, the interactive processes of intellectual production, and other material-based contributions that are embedded in white European male-authored accounts of 19th century exploration and appropriation in Africa. In this paper I will reappraise the collected artefact to provoke the identification of previously obfuscated voices in the historical data

Addressing Inherent Biases in Information Retrieval Systems of Digital Archives: A Multidisciplinary Study in Digital Archives of Holocaust Victims and Perpetrators

  • Seul Lee

UCLA School of Education & Information Studies

Diakopoulos (2018)1 stated that since algorithms rely on a quantified version of reality that only incorporates what is measurable as data, they can overlook much of the social context that would otherwise be essential in rendering an accurate decision. In particular, when we study historiography of a certain event, it is also important to acknowledge that data in digital archives are not evenly distributed across the actual entire categories. Since most datasets in scholarly digital archives are missing, hiding, less illuminated, or blurred, while some are highlighted, identifying prevailing algorithmic biases or other social contexts such as media tendencies or research trends can be challenging. Consequently, information retrievals rendered by machine-learning algorithms incorporates these biases. The inherent bias in scholarly digital archives can both directly and indirectly influence future trends of scholarly communication by affecting the results extracted by the user through the query. Thus, class-imbalanced datasets that are skewed differently towards majority groups or missing datasets in digital archives can affect the accuracy and performance of its information retrieval. I therefore propose a practical guideline and digital archive design for digital librarians and digital humanities scholars to improve the accuracy and performance of information retrieval in digital archives. Imbalanced classes in digital scholarly libraries and multimedia archives can be optimized with background information about the nature of the datasets and the corresponding archive characteristics. Digital archival studies with more sophisticated machine-learning algorithms and detailed explanations regarding the social context of a certain event can be combined to improve the trustworthiness of the resulting knowledge in digital libraries and media archives.

Keywords: Research trends analysis, Digital archival research, Class-imbalance learning

1 Diakopoulos, N. (2018). The Algorithms Beat. Data Journalism Handbook. Eds. Liliana Bounegru and Jonathan Gray.

SCWAReD: Scholar-Curated Worksets from the HathiTrust Research Center

  • Ryan Dubnicek ,
  • Jade Harrison ,
  • Isabella Magni ,
  • John A. Walsh ,
  • Maryemma Graham ,
  • J. Stephen Downie ,
  • Glen Layne-Worthey

University of Illinois Urbana-Champaign

Keywords:

digital libraries, datasets, text analysis

Abstract:

The Scholar-Curated Worksets for Analysis, Reuse & Dissemination (SCWAReD) project, generously supported by the Mellon Foundation, is producing a suite of curated worksets of materials from the HathiTrust Digital Library. SCWAReD aims to address inequities in both library collections and digital humanities research by identifying and remediating gaps within HathiTrust, and by using computationally-assisted efforts to recover content that is already part of the HathiTrust Digital Library but that may be difficult to discover with traditional metadata, in a traditional catalog, from within a massive digital collection. 

SCWAReD’s flagship collaboration is with the Black Books Interactive Project, part of the History of Black Writing, founded in 1983 at the University of Mississippi by SCWAReD Co-PI Maryemma Graham and hosted since 1998 at the University of Kansas. Four more projects were selected to create curated worksets, concurrently in development: “Mining the Native American Authored Works in HathiTrust for Insights” directed by Kun Lu, Raina Heaton, and Raymond Orr (University of Oklahoma), “The Black Fantastic: Curated Vocabularies, Artifact Analysis and Identification” directed by Clarissa West-White (Bethune Cookman University) and Seretha Williams (Augusta University), “Creating Period-Specific Worksets for Latin American Fiction,” directed by José Eduardo González (University of Nebraska, Lincoln) and “The National Negro Health Digital Project: Recovering and Restoring a Black Public Health Corpus,” directed by Kim Gallon (Purdue University). In each partnership, project teams bring content expertise, research questions, and curation experience, while HTRC assists with  HathiTrust collection access, provides research tools and environments, and methodological and technical expertise in text & data mining. Each workset will be accompanied by a scholarly introduction, documented derived datasets, and project reports. These comprehensive research packages will be hosted by HTRC and disseminated for re-use in research and teaching.

Our presentation will provide an overview of SCWAReD and preliminary analysis results from its collaborative projects.