Re-modelling legacy datasets: how to retrofit a data model

The Archivio Storico Ricordi is a particularly rich and diverse dataset, containing two hundred years of material relating both to music and to the business of music in a variety of formats – musical scores, photographs, correspondence, business documents.  However, the approach to digitising the archive had been similarly varied, both in terms of formats and approaches used and in prioritising what has been digitised.

Software solutions that evolve over years represent one of the biggest sustainability issues that faces those engaged in the digital humanities.  It is almost impossible to accurately maintain documentation, and over time, the rationale behind technically sound decisions may become less clear.  The process of moving data to a more modern architecture can be an invaluable opportunity to retrospectively model your dataset and to analyse errors that may have crept into your data during its evolution.  

This paper discusses the benefits and challenges of this process for the Archivio Storico Ricordi working in conjunction with the University of Sheffield’s Digital Humanities Institute.  The collaboration brought together the Archivo’s expert knowledge about the archive’s data with the DHI’s expertise in data modelling, particularly ontological approaches. It will describe how the move from FileMaker to a MySQL database lead to improvements in the quality of the dataset, and facilitated the explicit modelling of entities and relationships that had previously been implicit and not always validated.   It will also address the challenges of carrying out this process remotely, in two different countries.  What can you do when your data isn’t quite as “described on the tin”?