Artificial Intelligence At The Service Of The Genealogist
by Bob Coret
Archives in the Netherlands are investing a lot of work in digitizing big parts of their archive holdings, such as chronicles, notarial archives, archives of the States General and the United East India Company (VOC) and the West India Company (WIC).
This means that more and more scans are available and can be viewed online, including on the Open Archives website. This uses artificial intelligence (also called AI) - a great development for genealogists!
Automatic transcriptions
As an example, we take a scan from the archive of the Old West India Company (Nationaal Archief 1.05.01.01), specifically folio 53 from inventory 42: "Commissions, instructions, conditions for colonists. Files [...] from and to salt ships".
https://www.openarchieven.nl/transcripties/toon/NL-HaNA_1.05.01.01_42_0055
For those who cannot read such ancient manuscripts, automatic handwriting recognition – a form of artificial intelligence – offers a solution.
Several archiving institutions are collaborating on models to improve automatic handwriting recognition by providing scans and human-made transcriptions to computers so that they can learn from them. One service commonly used by archiving institutions is Transkribus, which can also be easily used (free of charge) via https://transkribus.ai/.
The current models recognize a wide variety of handwriting and can convert the displayed scan into the following transcription:
Compareerde voor Bewinthebberen der westjndisen
Comp.e de onderschreven persoonen, Soo voor hem
selven als sijne mede reeders, versouckende als opt andr
met
bladt ende rechter zijde Acte van Concessie om het
Schip genaemt den volphijn schipper Adriaen sends
te mogen varen onder de linuten vande westjndische
en hout
Comp.e omme aldaer sout, te becomen, De welcke hen
midts desen werden versunt, onder de conditien ende
borchtochten aen dander zijde gementioneert
Actum den 24 Novemb 1621 tot Middelburch
Cornelis Ccunelaer
Puthorno vernis
By converting scans into text, these transcriptions can be searched in full text, for example by name or place. In this way, Open Archives makes millions of scans from various archive institutions searchable.
Automatic summaries
Experienced researchers and historians can often interpret the displayed transcription well. For many others, however, Old Dutch is a challenge and it remains difficult to understand historical documents. But artificial intelligence can help here too! Based on a Large Language Model - also a form of artificial intelligence - a summary in modern English, Dutch, French and German can be created from the transcription. Personal names, place names and dates are also highlighted - this is a form of Named Entity Recognition (NER).
The above Dutch transcription is then summarized in English as follows:
On November 24, 1621, ship owners appeared before the directors of the West Indies Company in Middelburg. They requested permission for their ship called "De Dolphijn", captained by Adriaen Sends, to sail within the boundaries of the West Indies and Wood Company to collect salt. The permission was granted under certain conditions and securities. The document was signed by Cornelis Ccunelaer Puthorno.
Open Archives users can request a summary of each (scan and) transcription found and displayed, which appears within a few seconds of the request. In this way, artificial intelligence not only helps with the search but also with the understanding of historical (Dutch) documents.
https://www.openarchieven.nl/transcripties/
Anyone can view recently created summaries of historical documents via the page https://www.openarchieven.nl/transcripties/recent.php.
Note on AI
The transcriptions are created by computers (by archive facilities) using automatic handwriting recognition, but are not error-free. The error rate in handwriting recognition, expressed as the character error rate (CER), is often around 5% (i.e., out of 100 characters recognized in the handwritten template, only 5 are incorrect).
The summaries are created by computers (by Open Archives) based on a language model from these transcriptions. Even if the few errors in the transcription usually do not cause major problems, this artificial intelligence is not perfect either. However, the results are usually sufficient to make the transcription understandable.
The motto when using AI products such as transcriptions and summaries is: keep thinking critically yourself!
About Open Archives
Via https://www.openarchieven.nl/ genealogists can search genealogical data from Dutch, Belgian, French and Surinamese archives and associations – in four languages. Over 353 million historical person directories from around 160 organizations are available, often including scans (from the archive institutions). This automatically provides linked documents as well as additional information about people and their context. Open Archives is provided free of charge by Coret Genealogie.
Additional features are available with a paid subscription, including:
Monitoring searches (with notification of new search results),
Displaying ancestors in a family tree,
Automatically finding a couple's children in a civil certificate,
Downloading certificates in GEDCOM and PDF format,
Exporting search results in CSV or XLS format,
Creating summaries from transcriptions.
My thanks to Bob Coret for offering this information to Genea-Musings and its' readers.
The URL for this post is: https://www.geneamusings.com/2024/12/guest-post-artificial-intelligence-at.html
Please comment on this post on the website by clicking the URL above and then the "Comments" link at the bottom of each post. Share it on Twitter, Facebook, or Pinterest using the icons below. Or contact me by email at randy.seaver@gmail.com. Please note that all comments are moderated and may not appear immediately.
No comments:
Post a Comment