Quantitative analysis and textual interpretation in Caxton

Is it possible to isolate the language of an individual within printed material from the fifteenth century? How can specialised digital approaches (such as vector space modelling) illuminate our understanding of a book's material production? This paper discusses the theoretical and practical challenges of compiling and interpreting quantitative data when studying the work of compositors (type-setters) working for the premier English publisher, William Caxton (c.1422—c.1491), a study that contributes to a wider scholarly debate about spelling variation in early modern English and the constraints of a compositor's work.

 

The data has been selected on the basis of two factors: firstly, texts for which a transcription was already in existence (courtesy of EEBO-TCP), and secondly, texts for which breaks in compositors could be determined on the basis of bibliographical evidence. For each compositorial section, a wordlist was created comprising variant spellings and their frequencies. These frequencies form vectors then used in statistical similarity testing, in order to show the similarity of spelling systems between different compositors. 

 

This paper discusses the processes involved in taking the results of statistical analyses and interpreting them as they apply to the text itself, which is a material artefact. In doing so, I consider the challenges of extracting from a series of vectors the basis for drawing conclusions about language use. When there are myriad causes behind spelling variation, I discuss the value of statistical analysis in negotiating variation.  Finally, I consider the extent to which digital methods are viable in the pursuit of research questions such as that in this case study.