Text-mining, geo-coding and mapping historic smells

We live in the big data era. By 2002 digital data storage overtook analogue storage. Digitising humanities source material has already produced large datasets and the extensive amount of digital information presents us with unprecedented opportunities to shed new light on humanities research in new and innovative ways not possible previously.

 

The MOH reports were published annually by the Medical Officers of Health employed by local authorities. These reports provided vital statistics and a general statement on the health of the population in each borough. 5500 MOH reports for London spanning from 1848-1972 were digitised in 2012 by the Wellcome Library.

 

Although there were attempts at standardisation, the reports display each MOH’s interest, idiosyncrasies and particular strengths. As these reports are not standardised, creating a geo-coded dataset containing smell related vocabulary is a challenging task. We will present the results of the first phase of our research – textmining and geocoding for non-structured text through creating a novel geoparser.

 

For the first time data-mining the OCR’d text of the MOH reports for London will produce models that facilitate new kind of humanities research. Analysing the reports (second phase of the project) tells the intimate narratives of the everyday experiences of 19th and 20th century Londoners through the ‘smellscape’. Furthermore, it enables us to run various comparisons and assess if there are any links to the socio-economic identity of areas in London.

 

This project builds upon Daniele Quericia and his team’s recent project on mapping smells using social media. The MOH smell data will be available via their existing website (http://www.goodcitylife.org/). This has potential benefits of engaging with the public. Text-mining and geo-coding techniques add value to humanities research by demonstrating how new knowledge and insights have risen from the use of digital applications.