Session 12 Friday 11:30 - 13:00 High Tor 4 Chair: Seth Mehl
CitizenHertiage: Crowdsourcing, Digital Curation and Citizen Science with European Photographic Collections Fred Truyen , Sofie Taes KU Leuven Keywords: Photographic Heritage, Deep Learning, Crowdsourcing, Citizen Science In this paper we will discuss the results of work on Photographic Heritage collections in a series of EU-funded research projects, “Kaleidoscope: the 1950s in Europe”, "Europeana: Century of Change", where digitised collections were contributed to Europeana, and both AI as well as crowdsourcing technologies were used to enrich metadata. In the context of the project CitizenHeritage, we will indicate a roadmap in how this can lead to genuine citizen science. In pursuit of the many "Visual identities of Europe" in the 20th century, editorials and hybrid virtual/physical exhibitions were curated based on aggregated collections from a wide variety of heritage institutions, allowing to explore new relations between and objects, representations and narratives. We will explain how data aggregation, AI-supported data improvement and crowdsourced data validation contribute to innovative heritage research, involving citizens and stakeholder communities. We will present the results of these project, which include interesting views on this particular historical period, but also led to a range of physical, virtual and online exhibitions, editorials and MOOCs. More information: http://fifties.withculture.eu/home, http://www.citizenheritage.eu
Computer Vision and the History of Printing: Search, Segment and Classify Giles Bergel University of Oxford Computer vision has made significant progress in recent years, thanks in part to developments in machine learning, and is now poised to become a standard part of the toolkit of all those who work with such ‘legacy’ media as printed books and illustrations. Computers can now reliably match the same printed page or illustration; visualise variant typesettings or images; and, through, improvements in OCR models, extract text. More challenging applications, such as segmenting pages into meaningful parts and classifying their content, are within reach. But we still have much to learn about the printed book – in particular, how page layout and such material dimensions as paper, ink and format structure their form and meaning. This paper will offer an account of recent work in computer vision applied to print, largely using the example of the University of Oxford’s Visual Geometry Group (VGG), whose open-source tools are embedded in projects such as Bodleian Ballads Online; the 15C Illustration project; and the Traherne Digital Collator. It will assess how far ML-based approaches are able to classify the content of historical printed illustrations, using the British Library’s Million Images corpus as a case study, and give an account of the state of the art in OCR for pre-modern or other challenging typefaces. Lastly, it will offer some critical reflections on the ‘datafication’ of this iconic form of knowledge and cultural production, arguing that a historically-informed perspective on remediation has much to offer digital humanities’ practice across multiple media. Keywords: printing, typography, computer vision
Applying Machine Learning and Image Recognition to the Visual Culture of the Protestant Reformation Drew Thomas University College Dublin The Protestant Reformation was Europe’s first mass media event. The movement, begun by Martin Luther in 1517, caused a surge in printing across the Holy Roman Empire, as readers sought to stay updated on the movement’s development. Luther’s collaboration with Lucas Cranach, court painter to the Elector of Saxony, provided a new and vibrant visual culture to Europe’s reading and non-reading public. This is most famously seen in the depiction of a papal tiara upon the Whore of Babylon in Luther’s translation of the New Testament. This paper discusses applying machine learning and image recognition software to the illustrations, ornate letters and title page borders in scanned editions of early modern books. Libraries in Germany and Austria represent some of the most digitised, freely available collections in Europe. The project unites over 30,000 digitised books, representing millions of images, from various libraries and links them to the Universal Short Title Catalogue, ensuring standardised bibliographic metadata. This allows for a systematic examination of the role of images before and during the Protestant Reformation. Such an analysis provides for multiple strands of investigation. How were iconographies the same or different across regions and time? Many books during this period fail to include publication information. By applying image recognition software, woodcuts used in anonymous editions can be matched to publications identifying the printer. Furthermore, works by Luther were some of the most counterfeited books in this period. Printers even copied the woodcut title page borders used in the original Wittenberg editions. This software can assist in distinguishing between otherwise identical woodcuts. As increased internet speeds have allowed for more digital humanities research on data intensive images, this will be the first large-scale analysis of the role images played in the development of Europe’s reading culture during the Reformation.

Session 12

Friday 11:30 - 13:00

High Tor 4

Chair: Seth Mehl

CitizenHertiage: Crowdsourcing, Digital Curation and Citizen Science with European Photographic Collections

Fred Truyen ,
Sofie Taes

KU Leuven

Keywords: Photographic Heritage, Deep Learning, Crowdsourcing, Citizen Science

In this paper we will discuss the results of work on Photographic Heritage collections in a series of EU-funded research projects, “Kaleidoscope: the 1950s in Europe”, "Europeana: Century of Change", where digitised collections were contributed to Europeana, and both AI as well as crowdsourcing technologies were used to enrich metadata. In the context of the project CitizenHeritage, we will indicate a roadmap in how this can lead to genuine citizen science.

In pursuit of the many "Visual identities of Europe" in the 20th century, editorials and hybrid virtual/physical exhibitions were curated based on aggregated collections from a wide variety of heritage institutions, allowing to explore new relations between and objects, representations and narratives. We will explain how data aggregation, AI-supported data improvement and crowdsourced data validation contribute to innovative heritage research, involving citizens and stakeholder communities.

We will present the results of these project, which include interesting views on this particular historical period, but also led to a range of physical, virtual and online exhibitions, editorials and MOOCs.

More information: http://fifties.withculture.eu/home, http://www.citizenheritage.eu

Computer Vision and the History of Printing: Search, Segment and Classify

Giles Bergel

University of Oxford

Computer vision has made significant progress in recent years, thanks in part to developments in machine learning, and is now poised to become a standard part of the toolkit of all those who work with such ‘legacy’ media as printed books and illustrations. Computers can now reliably match the same printed page or illustration; visualise variant typesettings or images; and, through, improvements in OCR models, extract text. More challenging applications, such as segmenting pages into meaningful parts and classifying their content, are within reach. But we still have much to learn about the printed book – in particular, how page layout and such material dimensions as paper, ink and format structure their form and meaning.

This paper will offer an account of recent work in computer vision applied to print, largely using the example of the University of Oxford’s Visual Geometry Group (VGG), whose open-source tools are embedded in projects such as Bodleian Ballads Online; the 15C Illustration project; and the Traherne Digital Collator. It will assess how far ML-based approaches are able to classify the content of historical printed illustrations, using the British Library’s Million Images corpus as a case study, and give an account of the state of the art in OCR for pre-modern or other challenging typefaces. Lastly, it will offer some critical reflections on the ‘datafication’ of this iconic form of knowledge and cultural production, arguing that a historically-informed perspective on remediation has much to offer digital humanities’ practice across multiple media.

Keywords: printing, typography, computer vision

Applying Machine Learning and Image Recognition to the Visual Culture of the Protestant Reformation

Drew Thomas

University College Dublin

The Protestant Reformation was Europe’s first mass media event. The movement, begun by Martin Luther in 1517, caused a surge in printing across the Holy Roman Empire, as readers sought to stay updated on the movement’s development. Luther’s collaboration with Lucas Cranach, court painter to the Elector of Saxony, provided a new and vibrant visual culture to Europe’s reading and non-reading public. This is most famously seen in the depiction of a papal tiara upon the Whore of Babylon in Luther’s translation of the New Testament.

This paper discusses applying machine learning and image recognition software to the illustrations, ornate letters and title page borders in scanned editions of early modern books. Libraries in Germany and Austria represent some of the most digitised, freely available collections in Europe. The project unites over 30,000 digitised books, representing millions of images, from various libraries and links them to the Universal Short Title Catalogue, ensuring standardised bibliographic metadata. This allows for a systematic examination of the role of images before and during the Protestant Reformation.

Such an analysis provides for multiple strands of investigation. How were iconographies the same or different across regions and time? Many books during this period fail to include publication information. By applying image recognition software, woodcuts used in anonymous editions can be matched to publications identifying the printer. Furthermore, works by Luther were some of the most counterfeited books in this period. Printers even copied the woodcut title page borders used in the original Wittenberg editions. This software can assist in distinguishing between otherwise identical woodcuts. As increased internet speeds have allowed for more digital humanities research on data intensive images, this will be the first large-scale analysis of the role images played in the development of Europe’s reading culture during the Reformation.

DHC 2022 Click here to register