Fuzzy Dating and Ambiguous Courting: Accounting for varying metadata precision in historical semantic development

Accurate dating of the emergence and development of concepts is hampered by real world concerns about how rapidly a newly coined word makes its way into transmitted material and the survival of material which would enable the most accurate dating. As part of the AHRC-funded Linguistic DNA project, the ‘Lexicalization Pressure’ research stream faces multiple sources of potential dating fuzziness. This paper discusses methods for taking account of this fuzziness when investigating the data, using the noun court as an example.

When identifying the date at which a new word, or a new sense of a word, emerges, there is a necessary degree of uncertainty in the results. For lexicographical resources, dating accuracy is unavoidably affected by the fact that it is rarely the case that a word is coined in print, and thus the first usage is lost to posterity. In addition, the earliest known citation may occur in texts which are themselves of uncertain date, or in dictionary entries which rarely allow a sense of how recent a word’s coinage is. In text corpora, dating fuzziness is inherent in the use of historical collections such as Early English Books Online (EEBO-TCP), firstly because date of composition and/or printing may be uncertain, especially for earlier texts. When these are known, an extensive lag between a text’s composition and its printing may still mean that there are important questions to be asked about whether the language of the text can then genuinely be said to be ‘current’ at the time of printing.

Such problems have long been considered by lexicographers and others working with historical language. Whereas much painstaking work has been done by individual researchers, the Linguistic DNA project must work with the metadata available to it with minimal manual intervention. This paper discusses approaches developed by the ‘Lexicalization Pressure’ subtheme of the LDNA project to account for these uncertainties. This includes identifying semantic fields in which lexicographical data is restricted, and adjustment of the parameters used when employing dating metadata in LDNA processor outputs to study semantic field development.