From Lockdown to Jupyter: Creating Exploratory Notebooks for Cultural Heritage Datasets

Keywords: Libraries, datasets, Jupyter Notebooks

Abstract:

The National Library of Scotland’s Digital Scholarship Service has been releasing collections as data on its data-delivery platform, the Data Foundry (https://data.nls.uk/), since September 2019. Datasets are released as large, bulk downloads, enabling those with programming experience to analyse the collections at scale.

During the COVID-19 lockdown, the Service experienced significantly higher traffic, as library users increasingly made use of online resources. To ensure that as many users as possible were able to explore the datasets on the Data Foundry, the Library invested in a Digital Research Intern post, with a remit to provide introductory analysis of the Data Foundry collections using Jupyter Notebooks. 

The goal of these Notebooks was to provide users with an initial overview of the datasets and some analysis of the data as a start-point for their research, as well as to enable those with little or no coding skills to begin accessing the Library’s collections at scale. The Notebooks use Python, a widely-used programming language for data analysis, in conjunction with the Natural Language Toolkit platform to demonstrate text mining methods on digitized collections and bibliography metadata.

This paper provides a case study of the project, explaining the Library’s work to date releasing datasets on the Data Foundry; the reasoning behind providing Jupyter Notebooks; the Notebooks themselves and what types of analysis they contain, as well as the challenges faced in creating them; and the publication and impact of the Notebooks.