Session 7 Friday 09:30 - 11:00 High Tor 2 Chair: Jamie McLaughlin
Semi-automatic Multilevel annotation of vagueness in historical texts Cristina Vertan University of Hamburg Document annotation is a central method for many Digital humanities projects. Depending on the complexity of the information to be annotated, it can be done in an automatic or manual manner. Current annotation systems have a major drawback when used on documents from humanities: one can state only if a certain feature /information is present or not in the material: thus a typical 1/0 discrimination. Although it is true that through digitization the “continuum” of any humanities object gest lost, such annotation strategy leads not only to quite dramatic discretization of the initial object but can lead to wrong interpretations. Especially annotation of historical facts and documents must consider and integrate degrees of vagueness in the annotation. In the project HerCoRe –Hemeneutic and Computer-based Analysis of Reliability, Consistency and Vagueness in historical texts (financed by the Volkswagen Stiftung), we investigate: Which is the influence linguistic vagueness (e.g. vague adjectives, non intersectives, hedges, subjunctives) on the presented historical facts The degree to which the author is making use by purpose of vague expressions How reliable are expressions of “certainty” compared with quoted historical sources We distinguish between several vagueness layers: linguistic, editorial (markers by editors /translators) and factual (omission, wrong quotation). The object of our analysis are the texts of Dimitrie Cantemir, prince of Moldavia, whose works from the 18th century about the Ottoman Empire and Moldavia were already translated at that time in many languages like English, German and French. We propose a framework for vagueness annotation, which allows the user: to navigate any time between different annotation levels, to mark-up discontinuous assertions and to introduce concurrent mark-up Linguistic vagueness is annotated in a semi-automatic manner, editorial vagueness is inferred partially from the text, factual vagueness is annotated manually. A fuzzy knowledge base allows also the annotation of named entities (person, geographical information, dates).
The Emergence of Titling in the Nineteenth-Century French Art World: A Quantitative Analysis Mike Bowman Birkbeck College It is now taken for granted that paintings have titles, and practices around the giving, presenting and interpretation of titles have become institutionalised within the art world. However, that was not always the case. Until the end of the eighteenth century, as Ruth Yeazell has argued, the language used to refer to paintings was predominantly descriptive and classificatory (p. 39, Yeazell 2015). This paper uses a quantitative analysis of the entries made by artists in the catalogues of the Paris Salon and of critical reviews of those exhibitions to show how titling emerged in the nineteenth-century French art world. Drawing on digital archives, I constructed a database from a representative sample of catalogue entries covering the decades from the 1790s to the 1870s, together with a sample of critical writing. The trends in various statistical measures of that data such as the frequency of occurrence of key terms and the distribution of word length are interpreted as direct and indirect evidence of the emergence of titling as the functions associated with titling became dominant and other uses of the entries declined. To conceive of a catalogue entry as a title is more than re-describing a piece of text, and my analysis shows how the adoption of titling involved a wholesale shift in attitude towards their use and towards the status of the objects they named. It can be seen as a key component of the commercialisation of the French art world and the commodification of the art object. The use of databases and quantitative analysis is uncommon in art history, as is the ’distant’ perspective it allows. My work shows how these techniques can bring new ways of seeing and new kinds of knowledge into art history and can challenge the readings and preconceptions other scholars have brought to the nineteenth-century French art world.
How language technology can assist legal scholarly research Wim Peters , Louisa Parkes , Mitchell Lennan University of Strathclyde, Glasgow Within legal AI, natural language processing (NLP) techniques provide valuable text and knowledge base derived information for a wide variety of legal analysis. NLP techniques provide a bridge between the linguistic surface structure and the underlying conceptual content. Identifying, extracting, and formalizing legal knowledge remains a highly knowledge and labour intensive task, creating a significant bottleneck between the semantic content of the source material, expressed in natural language, and computer-based, automatic use of that content. Increasingly, natural language processing (NLP) techniques are applied to assist knowledge acquisition from text. In this paper we concentrate on a close reading setting that involves deep text interpretation by legal scholars in order to address their research questions. The inclusion of NLP techniques into legal interpretation workflows customised to scholars’ research needs entails a set of automated analysis tasks that assist legal scholars, and a feedback procedure between automated results and manual legal interpretation. This integration of manual and automatic analysis of textual material aims at maximizing the acquisition and exploration of the conceptual structure of the legal domains. The NLP results are therefore not a one-step solution, but provide textually derived information and structured knowledge in an incremental and flexible way. This knowledge can then be used for further exploration and interpretation by experts, and eventually, if required, formally modelled in the form of an ontology. NLP methodologies use techniques such as linguistic analysis, named entity recognition, term extraction and relation extraction.. The extracted information is associated with the source texts through text metadata in the form of annotations. For the creation and presentation of these annotations we make use of state of the art tools such as GATE (http://www.gate.ac.uk). Together with information from external resources such as term banks and ontologies, this will result in an integrated knowledge structure that makes semantic content explicit and accessible for manual expert interpretation and evaluation. This knowledge base will be built semi-automatically through a collaborative effort involving language technology and legal expertise for interpretation and modelling. In order to illustrate this workflow we will present a case study for the integration of NLP tasks into a scholarly workflow.

Session 7

Friday 09:30 - 11:00

High Tor 2

Chair: Jamie McLaughlin

Semi-automatic Multilevel annotation of vagueness in historical texts

Cristina Vertan

University of Hamburg

Document annotation is a central method for many Digital humanities projects. Depending on the complexity of the information to be annotated, it can be done in an automatic or manual manner. Current annotation systems have a major drawback when used on documents from humanities: one can state only if a certain feature /information is present or not in the material: thus a typical 1/0 discrimination. Although it is true that through digitization the “continuum” of any humanities object gest lost, such annotation strategy leads not only to quite dramatic discretization of the initial object but can lead to wrong interpretations. Especially annotation of historical facts and documents must consider and integrate degrees of vagueness in the annotation.

In the project HerCoRe –Hemeneutic and Computer-based Analysis of Reliability, Consistency and Vagueness in historical texts (financed by the Volkswagen Stiftung), we investigate:

Which is the influence linguistic vagueness (e.g. vague adjectives, non intersectives, hedges, subjunctives) on the presented historical facts
The degree to which the author is making use by purpose of vague expressions
How reliable are expressions of “certainty” compared with quoted historical sources

We distinguish between several vagueness layers: linguistic, editorial (markers by editors /translators) and factual (omission, wrong quotation).

The object of our analysis are the texts of Dimitrie Cantemir, prince of Moldavia, whose works from the 18th century about the Ottoman Empire and Moldavia were already translated at that time in many languages like English, German and French.

We propose a framework for vagueness annotation, which allows the user:

to navigate any time between different annotation levels,
to mark-up discontinuous assertions and
to introduce concurrent mark-up

Linguistic vagueness is annotated in a semi-automatic manner, editorial vagueness is inferred partially from the text, factual vagueness is annotated manually. A fuzzy knowledge base allows also the annotation of named entities (person, geographical information, dates).

The Emergence of Titling in the Nineteenth-Century French Art World: A Quantitative Analysis

Mike Bowman

Birkbeck College

It is now taken for granted that paintings have titles, and practices around the giving, presenting and interpretation of titles have become institutionalised within the art world. However, that was not always the case. Until the end of the eighteenth century, as Ruth Yeazell has argued, the language used to refer to paintings was predominantly descriptive and classificatory (p. 39, Yeazell 2015). This paper uses a quantitative analysis of the entries made by artists in the catalogues of the Paris Salon and of critical reviews of those exhibitions to show how titling emerged in the nineteenth-century French art world.

Drawing on digital archives, I constructed a database from a representative sample of catalogue entries covering the decades from the 1790s to the 1870s, together with a sample of critical writing. The trends in various statistical measures of that data such as the frequency of occurrence of key terms and the distribution of word length are interpreted as direct and indirect evidence of the emergence of titling as the functions associated with titling became dominant and other uses of the entries declined.

To conceive of a catalogue entry as a title is more than re-describing a piece of text, and my analysis shows how the adoption of titling involved a wholesale shift in attitude towards their use and towards the status of the objects they named. It can be seen as a key component of the commercialisation of the French art world and the commodification of the art object.

The use of databases and quantitative analysis is uncommon in art history, as is the ’distant’ perspective it allows. My work shows how these techniques can bring new ways of seeing and new kinds of knowledge into art history and can challenge the readings and preconceptions other scholars have brought to the nineteenth-century French art world.

How language technology can assist legal scholarly research

Wim Peters ,
Louisa Parkes ,
Mitchell Lennan

University of Strathclyde, Glasgow

Within legal AI, natural language processing (NLP) techniques provide valuable text and knowledge base derived information for a wide variety of legal analysis. NLP techniques provide a bridge between the linguistic surface structure and the underlying conceptual content. Identifying, extracting, and formalizing legal knowledge remains a highly knowledge and labour intensive task, creating a significant bottleneck between the semantic content of the source material, expressed in natural language, and computer-based, automatic use of that content. Increasingly, natural language processing (NLP) techniques are applied to assist knowledge acquisition from text.

In this paper we concentrate on a close reading setting that involves deep text interpretation by legal scholars in order to address their research questions.

The inclusion of NLP techniques into legal interpretation workflows customised to scholars’ research needs entails a set of automated analysis tasks that assist legal scholars, and a feedback procedure between automated results and manual legal interpretation. This integration of manual and automatic analysis of textual material aims at maximizing the acquisition and exploration of the conceptual structure of the legal domains. The NLP results are therefore not a one-step solution, but provide textually derived information and structured knowledge in an incremental and flexible way. This knowledge can then be used for further exploration and interpretation by experts, and eventually, if required, formally modelled in the form of an ontology.

NLP methodologies use techniques such as linguistic analysis, named entity recognition, term extraction and relation extraction.. The extracted information is associated with the source texts through text metadata in the form of annotations. For the creation and presentation of these annotations we make use of state of the art tools such as GATE (http://www.gate.ac.uk). Together with information from external resources such as term banks and ontologies, this will result in an integrated knowledge structure that makes semantic content explicit and accessible for manual expert interpretation and evaluation. This knowledge base will be built semi-automatically through a collaborative effort involving language technology and legal expertise for interpretation and modelling.

In order to illustrate this workflow we will present a case study for the integration of NLP tasks into a scholarly workflow.

DHC 2018 Click here to register