Session 2

Thursday 14:00 - 15:30

High Tor 3

Chair: Isabella Magni

Implementing Linked Ancient World Data: recommendations from research with users and producers

  • Sarah Middle

National Museum of Scotland

Keywords: Linked Data, Ancient World, user research

Linked Data technologies have the potential to transform Humanities research; however, in many disciplines their implementation remains at an experimental stage, with little research into how the results are consumed by end users. There are, however, an increasing number of Linked Data tools and resources produced for Ancient World research, many of which are usable by researchers with minimal technical experience.

 

My study into the use and production of Linked Ancient World Data took place between 2018-2019 and included a survey of Ancient World researchers, followed by a series of interviews with selected participants. Initial questions related to the use and production of digital tools and resources more generally, before focusing on Linked Data. Participants’ experiences provided valuable insights into how Linked Data tools and resources are perceived by end users at all levels of technical expertise, as well as how such tools and resources might be developed in future.

In this paper, I will start by outlining my survey and interview methodology, before turning my focus to the results of this study. Findings will be presented as a series of recommendations for best practice in Linked Ancient World Data implementation. Topics will include identifying user goals, promoting usability (and reusability), working towards data accuracy and completeness, integrating effective documentation, ensuring sustainability, and disseminating the end product. Recommendations will include steps to potentially improve existing tools and resources as well as points to consider when starting a new project. In addition to insights from research participants, I will incorporate examples of best practice from available tools and resources.

While my focus on Linked Ancient World Data may seem relatively narrow, it provides a case study in the application of technological solutions to Humanities research questions. I therefore hope my findings might be applicable across the breadth of the Digital Humanities.

Mimesis and the importance of female characters A comparative social network analysis of Dutch literary fiction, 1960s vs 2010s

  • Roel Smeets

Radboud University Nijmegen

Mimesis is among the oldest and most fundamental concepts of literary theory. Since Plato’s introduction of the term in the Republic it has continued to exert influence over theories of artistic representation. As Derrida wrote: ‘the whole history of the interpretation of the arts and letters has moved and been transformed within the diverse logical possibilities opened up by the concept of mimesis’ (cited in Potolsky 2006: 2). At first glance, the idea that literature imitates life makes sense as authors often seem to write about the world around them. However, the history of literary theory has witnessed a diverse range of attitudes towards this seemingly clear idea. While both Plato and Aristotle take their cue from the belief that art mirrors reality, they draw different conclusions as to the moral aspects of artistic representation. For Plato, the imitative nature of literature is a reason to ban poets and artists from the perfect city. As a mere copy of a copy, literature is illusory and deceptive. By contrast, Aristotle sees artistic imitation as perfectly ‘natural, rational and educational’ and even ‘beneficial’ (Potolsky 2006: 46). It does not merely copy the real; it has the potential to reveal universal truths and produce cathartic effects in human beings.

This paper contributes to the longstanding discussion on the imitative dimension of literary representation by approaching it from a computational and statistical perspective. More specifically, it explores the potential of social network analysis for studying the ways in which societal dynamics are realistically reflected in products of literary fiction. Using data-driven methods it thus draws on a tradition of literary criticism that prevailed between the 1930s and 1950s and that has recently been revitalized by scholars working with character network analysis (e.g. Smeets forthcoming, Selisker 2015, Labatut & Bost 2019). Often working with relatively large corpora, this tradition systematically studied how societal trends (female employment, national norms and values, divorce) were reflected in fiction (e.g. Inglis 1938, Berelson & Salter 1946, Barnett & Gruen 1948, Albrecht 1956). This paper explores the two general hypotheses that were often used as a point of departure in this research tradition: 1. literature reflects societal trends (the reflection theory), 2. literature shapes or incites societal trends (the social control theory). In doing so, it pays special attention to abstract, elusive notions such as ‘real’, ‘fictional’ and ‘reflection’ in light of discussions on mimesis. What does it mean for a societal phenomenon to be reflected, mirrored, echoed or reproduced in the interactions between fictional characters?

As a case study, it aims to trace the influence of second- and third-wave feminism on the position of female characters in Dutch language fiction. In light of the changing position of women in society, do we observe a shift in the position of female characters in a period of approximately 50 years? In order to answer this question, it compares earlier results on the centrality of female characters in a corpus of 170 Dutch novels published in 2012 (Smeets et al 2019, Smeets forthcoming) with a similar analysis of the centrality of female characters in a corpus of 160 novels published in the period 1961-1965, which is just before second-wave feminism took off in the Netherlands.

The methodological point of departure is the approach to character network analysis as developed in my books Character Constellations (2021) and Actual Fictions (2022). Based on the co-occurrence of characters on the sentence level (Smeets et al 2019), it semi-automatically extracts social networks of fictional characters from each of the novels in the corpora.[1] This approach is semi-automatic in the sense that it uses automatic named entity recognition to detect characters and their name variants (‘Frits’, ‘Frits van Egters’, ‘Van Egters’), after which errors are manually corrected for each individual novel by two student-assistants. Gender resolution of these characters is also semi-automatic: based on lists of most popular Dutch male and female names of the Meertens voornamenbank, it is automatically estimated whether a character falls in one of three gender categories (male, female, gender neutral), after which errors are again manually corrected.[2] Subsequently, a range of network centrality metrics is used to explore how important, dominant, or influential female characters are in the fictional networks extracted from the corpus. Do female characters in Dutch literature become more or less central between the 1960s and 2012? How can we interpret the answer to that question in light of the rise and peak of Dutch second- and third-wave feminism? Does it provide an argument for the reflection or for the social control theory? In order to disentangle the complexities of these questions, the statistical patterns are evaluated through a close reading of one novel from the corpus.

References

Albrecht, M. "Does literature reflect common values?." American Sociological Review 21.6 (1956): 722-729.

Barnett, J. & R. Gruen. "Recent American Divorce Novels, 1938-1945: A Study in the Sociology of Literature." Social Forces (1948): 322-327.

Berelson, B. & P. Salter. "Majority and minority Americans: An analysis of magazine fiction." Public Opinion Quarterly 10.2 (1946): 168-190.

Inglis, Ruth A. "An objective approach to the relationship between fiction and society." American Sociological Review 3.4 (1938): 526-533.

Labatut, V & X. Bost. “Extraction and Analysis of Fictional Character Networks: A Survey.”

ACM Computing Surveys 52.5 (2019): 89. https://doi.org/10.1145/3344548

Potolsky,  M.. Mimesis. The New Critical Idiom. New York/London (Routledge): 2006.

Smeets, R., E. Sanders, and A. van den Bosch. "Character Centrality in Present-Day Dutch Literary Fiction" Digital Humanities Benelux Journal (2019), 1, 71-90.

Smeets, R. Character Constellations. Representations of Social Groups in Present-Day Dutch Literary Fiction. Leuven (Leuven University Press): 2021.

Smeets, R. Actual Fictions. Literary Representation and Character Network Analysis. Cambridge (Cambridge University Press): 2022.

Volker, B. & R. Smeets. "Imagined social structures: Mirrors or alternatives? A comparison between networks of characters in contemporary Dutch literature and networks of the population in the Netherlands." Poetics (2019): 101379.

 

[2] See https://github.com/roelsmeets/actual-fictions for the scripts mentioned here.

A Quantitative Analysis of Digital Scholarly Editions

  • Michael Kurzmeier ,
  • James O’Sullivan ,
  • Órla Murphy ,
  • Michael Pidd ,
  • Bridgette Wessels

University College Cork

Digital scholarly editions are key resources for arts and humanities research, and predate in various forms the concepts of digital humanities or humanities computing (Sula and Hill 2019). While individual projects are remembered for their contribution to the field, few comprehensive data sources exist to show the development of the field. This paper is both an analysis of the sources from which to write a history of digital scholarly editing, and an overview of the state and development of the field using quantitative methods.

Digital editions are positioned between drawing from archived material, and being an archive themselves (Dillen 2019, 266). In addition to that, digital editions also are web resources in need of archiving, lest they fall subject to link rot and very soon disappear from the web either for the lack of a persistent identifier or lack of maintenance. For digital editions past and present, two main data sources are available. Patrick Sahle lists around 700 editions in a curated catalog (Sahle, n.d.), while the Catalogue of Digital Editions features about 320 digital editions in a database (Franzini 2012). Both sources have different criteria for inclusion, overlap in content and differ in granularity, yet these are the sources from which a history of digital scholarly editions will mostly draw. Analysis of these sources will present them in their scope, aim and usability for research, while highlighting underrepresented areas of data collection on digital scholarly editions.

A quantitative analysis of the two sources combined will then provide data-driven insight into the development of digital scholarly editions since the 1970s. The analysis will in a first step focus on the amount of projects and their average duration over time to produce an overview of the field. In a second step, long-term cycles such as the adaptation of TEI-XML and open access standards will be analysed. Preservation and availability of all editions listed in both data sources will show the loss rate affecting digital scholarly editions and lead back to a discussion of the current state and history of the field based on the work currently being undertaken within the C21 Editions project.

 

Acknowledgements

This research is part of C21 Editions: Scholarly Editing and Publishing in the Digital Age, a three-year international collaboration jointly funded by the Arts & Humanities Research Council (AH/W001489/1) and Irish Research Council (IRC/W001489/1).

 

Bibliography:

Dillen, Wout. 2019. ‘On Edited Archives and Archived Editions’. International Journal of Digital Humanities 1 (2): 263–77. https://doi.org/10.1007/s42803-019-00018-4.

Franzini, Greta. 2012. ‘Gfranzini/Digeds_Cat: First Release’. Zenodo. https://doi.org/10.5281/ZENODO.1161425.

Sahle, Patrick. n.d. ‘A Catalog of Digital Scholarly Editions - by Title, Complete List, a-z ( 714 Items)’. https://v3.digitale-edition.de/vlet_a-z.html.

Sula, Chris Alen, and Heather V Hill. 2019. ‘The Early History of Digital Humanities: An Analysis of Computers and the Humanities (1966–2004) and Literary and Linguistic Computing (1986–2004)’. Digital Scholarship in the Humanities, November, fqz072. https://doi.org/10.1093/llc/fqz072.

Big Language Data Comes with Big Opportunities and Big Challenges: A Learner-Corpus Case Study

  • Itamar Shatz

University of Cambridge

Big language datasets allow researchers in various digital-humanities fields, such as corpus linguistics, to analyze language use in novel ways, using uniquely large and diverse samples. However, the scale of these datasets also creates new challenges for developing and using them.

Here, I present a case study on my development and use of such a dataset—the EFCAMDAT Cleaned Subcorpus—which contains over 700,000 texts, written by learners in an international online English school.

Specifically, I outline the issues that I encountered during the development process, which are common among such datasets, and show how I dealt with them using methods from the digital humanities. For example, I show how I identified duplicate content by measuring the overlap between texts using Hamming distance, how I identified non-English content using the cld2 classifier, and how I categorized texts based on their topic using LDA modelling.

In addition, I show how the scale of the new dataset enabled me to analyze language use with a sample that is much larger and more diverse than in past studies, in terms of factors such as the number of texts, learners, and tasks involved. Furthermore, I show how these factors enabled me to analyze language use in novel ways, for example by quantifying task effects using mixed-effects statistical models.

This case study provides important lessons that apply broadly across many types of big language datasets. Most notably, it demonstrates the need to thoroughly organize and clean such datasets, using quantitative and computational methods that can be implemented in a scalable manner. These lessons will help researchers in the digital humanities become aware of the specific opportunities that big language datasets offer, while also informing them of the associated challenges, and providing guidance on how to overcome those challenges.

Keywords: corpus linguistics; quantitative and computational methods; data curation