Scrutiny: A Firefox Extension for Entity Recognition within Research Data

Summary:

The project developed a Firefox extension called Scrutiny, which scans scan web pages selected by individual users and highlights entities that it thinks will interest them.

Project Status:

Completed

Funders:

Jisc

Partners:

University of Sheffield
PlayGen Limited
University of Hertfordshire

Subjects:

Firefox add-on, history, software development, text and image analysis

Technologies:

C++, XML

Project Website

Project Description

This project was a collaboration between two historians (the directors of the Old Bailey, Central Criminal Courts and Plebeian Lives projects), the humanities informatics team at the Humanities Research Institute and a serious games company, PlayGen Limited, who provided additional programming support as part of an ongoing knowledge exchange relationship between the company and the HRI.

The project developed a Firefox extension called Scrutiny, which scans scan web pages selected by individual users and highlights entities that it thinks will interest them. Its primary purpose is to increase the speed and efficiency with which HE and non-HE researchers are able to locate potentially relevant information within large data objects such as journal articles or full-text datasets, thereby directly addressing the conundrum of information overload and improving research productivity as a consequence.

Users are able to train Scrutiny to identify entities which are relevant to their field of research both by using pre-defined, subject-specific ‘entity recognition files’, and by refining Scrutiny´s understanding of their personal interests through an iterative process of accepting or discarding the suggestions which Scrutiny presents. Scrutiny was developed using natural language processing, including `named entity recognition´ based on a Bayesian learning methodology. In this instance, an entity could be the name of a person, a place, an artefact, term or phrase, depending on the subject of study. For example, the test datasets used by this project focused on eighteenth- and nineteenth-century criminal justice and, as a result, the `entity´ identified might be a crime, a verdict or a sentence; or a collection of less well defined types of behaviour recorded in depositions and criminal evidence.

Scrutiny is available to download for free from the project wiki, the IE Demonstrator blog and from Mozilla’s add-on repository. All source code and documentation has been released as open source for further refinement and enhancement by the developer community.

Duration: 1st June 2009 – 20th November 2009

Project Team

  • Prof. Tim Hitchcock (University of Hertfordshire)
  • Prof. Robert Shoemaker (University of Sheffield)
  • PlayGen Limited (http://www.playgen.com)
  • Michael Pidd (Digital Director – University of Sheffield)