Linguistic DNA of Modern Western Thought


The aim of this project is to understand the evolution of early modern thought by modelling the semantic and conceptual changes which occurred in English discourse (c.1500-c.1800). The project will use information extraction techniques and data visualisation to identify lexical patterns in 250,00 texts.

Project Status:

In Progress


Arts and Humanities Research Council (AHRC)


University of Sheffield
University of Glasgow
University of Sussex


data visualisation, early modern period, early printed books, history of ideas, large datasets, linguistics, online resource, text data mining



Project Description

The Linguistic DNA of Modern Western Thought: Paradigmatic Terms in English, 1500-1800.

The aim of this project is to understand the evolution of early modern thought by modelling the semantic and conceptual changes which occurred in English discourse (c.1500-c.1800). It will do so through the following objectives:

1. Using information extraction techniques, identify lexical patterns within approximately 37 million pages, using 48,327 re-keyed texts from Early English Books Online (EEBO) and approximately 205,000 OCR-ed texts from Gale Cengage’s Eighteenth Century Corpus Online (ECCO). The total dataset comprises over 250,000 texts.

2. Using data visualisation, evaluate the accuracy of the information extraction techniques against the projects research questions. These research questions fall into three Research Themes:

  • Contexts of Semantic Change, will explore the historical and discursive circumstances of concept development.
  • Lexical Families and Conceptual Fields, will explore the linguistic characteristics of concepts and their constituent keywords.
  • Lexicalisation Pressure, will explore the characteristics of word formation and vocabulary size within conceptual fields.

3. Develop and share knowledge about the project’s methodologies via two workshops that will be focussed on computer-assisted language analysis, language change and data visualisation.

4. Present the results of the Research Themes as a series of published outputs: a volume of essays on paradigmatic terms between 1500 and 180; a co-authored book on Language and Conceptual Change; refereed articles; and an online, open source collection of essays on technical methodologies.

5. Make the resulting database of lexical patterns, plus the search and visualisation features, available to the wider community for their own research purposes. This will be in the form of a public website, as well as a Web API and data download feature that will enable the entire dataset to be shared and re-used.

6. Demonstrate the wider applicability of the information extraction and concept modelling techniques by developing a demonstrator for a modern body of scholarship (eg. JSTOR) and hosting two Knowledge Modelling Impact Workshops.

Project Team

  • Professor Susan Fitzmaurice (Principal Investigator – University of Sheffield)
  • Dr Justyna Robinson (Co-Investigator – University of Sussex)
  • Dr Marc Alexander (Co-Investigator – University of Glasgow)
  • Dr Fraser Dallachy (Research Associate – University of Glasgow)
  • Brian Aitken (Digital Humanities Research Officer – University of Glasgow)
  • Dr Iona Hine (Research Associate - University of Sheffield)
  • Dr Seth Mehl (Research Associate - University of Sheffield)
  • Katherine Rogers (Digital Humanities Developer – University of Sheffield)
  • Matthew Groves (Digital Humanities Developer – University of Sheffield)
  • Michael Pidd (Co-Investigator – University of Sheffield)