Pages

Monday, December 9, 2024

Guest Post: Artificial Intelligence At The Service Of The Genealogist, by Bob Coret

Artificial Intelligence At The Service Of The Genealogist

by Bob Coret

Archives in the Netherlands are investing a lot of work in digitizing big parts of their archive holdings, such as chronicles, notarial archives, archives of the States General and the United East India Company (VOC) and the West India Company (WIC).


This means that more and more scans are available and can be viewed online, including on the Open Archives website. This uses artificial intelligence (also called AI) - a great development for genealogists!

Automatic transcriptions

As an example, we take a scan from the archive of the Old West India Company (Nationaal Archief 1.05.01.01), specifically folio 53 from inventory 42: "Commissions, instructions, conditions for colonists. Files [...] from and to salt ships".


https://www.openarchieven.nl/transcripties/toon/NL-HaNA_1.05.01.01_42_0055

For those who cannot read such ancient manuscripts, automatic handwriting recognition – a form of artificial intelligence – offers a solution.

Several archiving institutions are collaborating on models to improve automatic handwriting recognition by providing scans and human-made transcriptions to computers so that they can learn from them. One service commonly used by archiving institutions is Transkribus, which can also be easily used (free of charge) via https://transkribus.ai/.

The current models recognize a wide variety of handwriting and can convert the displayed scan into the following transcription:

  1. Compareerde voor Bewinthebberen der westjndisen

  2. Comp.e de onderschreven persoonen, Soo voor hem

  3. selven als sijne mede reeders, versouckende als opt andr

  4. met

  5. bladt ende rechter zijde Acte van Concessie om het

  6. Schip genaemt den volphijn schipper Adriaen sends

  7. te mogen varen onder de linuten vande westjndische

  8. en hout

  9. Comp.e omme aldaer sout, te becomen, De welcke hen

  10. midts desen werden versunt, onder de conditien ende

  11. borchtochten aen dander zijde gementioneert

  12. Actum den 24 Novemb 1621 tot Middelburch

  13. Cornelis Ccunelaer

  14. Puthorno vernis

By converting scans into text, these transcriptions can be searched in full text, for example by name or place. In this way, Open Archives makes millions of scans from various archive institutions searchable.

Automatic summaries

Experienced researchers and historians can often interpret the displayed transcription well. For many others, however, Old Dutch is a challenge and it remains difficult to understand historical documents. But artificial intelligence can help here too! Based on a Large Language Model - also a form of artificial intelligence - a summary in modern English, Dutch, French and German can be created from the transcription. Personal names, place names and dates are also highlighted - this is a form of Named Entity Recognition (NER).

The above Dutch transcription is then summarized in English as follows:

On November 24, 1621, ship owners appeared before the directors of the West Indies Company in Middelburg. They requested permission for their ship called "De Dolphijn", captained by Adriaen Sends, to sail within the boundaries of the West Indies and Wood Company to collect salt. The permission was granted under certain conditions and securities. The document was signed by Cornelis Ccunelaer Puthorno.

Open Archives users can request a summary of each (scan and) transcription found and displayed, which appears within a few seconds of the request. In this way, artificial intelligence not only helps with the search but also with the understanding of historical (Dutch) documents.


https://www.openarchieven.nl/transcripties/ 

Anyone can view recently created summaries of historical documents via the page https://www.openarchieven.nl/transcripties/recent.php.

Note on AI

The transcriptions are created by computers (by archive facilities) using automatic handwriting recognition, but are not error-free. The error rate in handwriting recognition, expressed as the character error rate (CER), is often around 5% (i.e., out of 100 characters recognized in the handwritten template, only 5 are incorrect).

The summaries are created by computers (by Open Archives) based on a language model from these transcriptions. Even if the few errors in the transcription usually do not cause major problems, this artificial intelligence is not perfect either. However, the results are usually sufficient to make the transcription understandable.

The motto when using AI products such as transcriptions and summaries is: keep thinking critically yourself!

About Open Archives

Via https://www.openarchieven.nl/ genealogists can search genealogical data from Dutch, Belgian, French and Surinamese archives and associations – in four languages. Over 353 million historical person directories from around 160 organizations are available, often including scans (from the archive institutions). This automatically provides linked documents as well as additional information about people and their context. Open Archives is provided free of charge by Coret Genealogie.

Additional features are available with a paid subscription, including:

  • Monitoring searches (with notification of new search results),

  • Displaying ancestors in a family tree,

  • Automatically finding a couple's children in a civil certificate,

  • Downloading certificates in GEDCOM and PDF format,

  • Exporting search results in CSV or XLS format,

  • Creating summaries from transcriptions.

=============================================

My thanks to Bob Coret for offering this information to Genea-Musings and its' readers.

Copyright (c) 2024, Randall J. Seaver

Please comment on this post on the website by clicking the URL above and then the "Comments" link at the bottom of each post. Share it on Twitter, Facebook, or Pinterest using the icons below. Or contact me by email at randy.seaver@gmail.com.  Please note that all comments are moderated and may not appear immediately. 

Subscribe to receive a free daily email from Genea-Musings using www.Blogtrottr.com.


No comments:

Post a Comment