The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use

Abstract : The Karjala database contains digitized demographic data of the parish registers from the regions ceded to the Soviet Union in 1944. The objectives of the digitization project have been to promote access to digitized records for scientific research and genealogy as well as encouraging research on the people of the ceded Karelia region. The main sources for the database have been catechetical lists, lists of children, and registers of vital statistics (registers of births, marriages, migrations and deaths) that are available in Digital Archives of the National Archives of Finland from the period of 1681 – 1949. The data in the database amounts to about 10.3 million entries, but only data older than 100 years is published openly on the Internet. According to decisions by the Finnish data protection authorities, the Personal Data Act is applied to personal registers less than 100 years old. The digitization process is still going on; it has been calculated that there are 1.2 million entries still to be processed. The database is available to users via https://katiha.mamk.fi/. At present, there are about 6.5 million file entries available on the Internet, each presenting data about one individual, e.g. names, the date of birth and death, the cause of death, age, gender, marital status, occupation, residence, migration, the parish. The Karjala database can be exploited for diverse research purposes; it improves access to the church records that are sometimes very difficult to read. Information in the database can be utilized for historical research, medical genetics, social sciences, and family and onomastics. The database is can be utilized for clarifying family structures, migratory patterns or child mortality. The database also offers excellent opportunities for interdisciplinary research. Our presentation will describe the digitization process management of old, handwritten documents that consist of non-structured data from a historical period that contains varied linguistic material: several languages from a historical period where nations, states and languages were still evolving, different calendars and spelling rules etc. We will also introduce our plans to use text recognition technology so that the handwritten documents such as the Karjala database will be incorporated into the international READ project network http://read.transkribus.eu/network/. We will also discuss the challenges encountered in this type of heterogeneous data and the possibilities for more defined and structured data management that could enable the automated use of the database. We will also include in our presentation a description of the evolution of the different phases of the database, emphasizing the evolution of the database and its linkage with internet technologies e.g. how they have either hindered or enabled the digitization project.
Type de document :
Communication dans un congrès
DH. Opportunities and Risks. Connecting Libraries and Research, Aug 2017, Berlin, Germany. 〈https://dh-libraries.sciencesconf.org〉
Liste complète des métadonnées

Littérature citée [4 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01660143
Contributeur : Laurent Romary <>
Soumis le : lundi 11 décembre 2017 - 12:07:37
Dernière modification le : jeudi 8 février 2018 - 10:36:24
Document(s) archivé(s) le : lundi 12 mars 2018 - 12:23:43

Fichier

331967.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

  • HAL Id : hal-01660143, version 1

Collections

Citation

Jarmo Saarti, Jari Ropponen, Satu Soivanen. The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use. DH. Opportunities and Risks. Connecting Libraries and Research, Aug 2017, Berlin, Germany. 〈https://dh-libraries.sciencesconf.org〉. 〈hal-01660143〉

Partager

Métriques

Consultations de la notice

111

Téléchargements de fichiers

552