The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use

Abstract : The Karjala database contains digitized demographic data of the parish registers from the regions ceded to the Soviet Union in 1944. The objectives of the digitization project have been to promote access to digitized records for scientific research and genealogy as well as encouraging research on the people of the ceded Karelia region. The main sources for the database have been catechetical lists, lists of children, and registers of vital statistics (registers of births, marriages, migrations and deaths) that are available in Digital Archives of the National Archives of Finland from the period of 1681 – 1949. The data in the database amounts to about 10.3 million entries, but only data older than 100 years is published openly on the Internet. According to decisions by the Finnish data protection authorities, the Personal Data Act is applied to personal registers less than 100 years old. The digitization process is still going on; it has been calculated that there are 1.2 million entries still to be processed. The database is available to users via https://katiha.mamk.fi/. At present, there are about 6.5 million file entries available on the Internet, each presenting data about one individual, e.g. names, the date of birth and death, the cause of death, age, gender, marital status, occupation, residence, migration, the parish. The Karjala database can be exploited for diverse research purposes; it improves access to the church records that are sometimes very difficult to read. Information in the database can be utilized for historical research, medical genetics, social sciences, and family and onomastics. The database is can be utilized for clarifying family structures, migratory patterns or child mortality. The database also offers excellent opportunities for interdisciplinary research. Our presentation will describe the digitization process management of old, handwritten documents that consist of non-structured data from a historical period that contains varied linguistic material: several languages from a historical period where nations, states and languages were still evolving, different calendars and spelling rules etc. We will also introduce our plans to use text recognition technology so that the handwritten documents such as the Karjala database will be incorporated into the international READ project network http://read.transkribus.eu/network/. We will also discuss the challenges encountered in this type of heterogeneous data and the possibilities for more defined and structured data management that could enable the automated use of the database. We will also include in our presentation a description of the evolution of the different phases of the database, emphasizing the evolution of the database and its linkage with internet technologies e.g. how they have either hindered or enabled the digitization project.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [4 references]  Display  Hide  Download

https://hal.inria.fr/hal-01660143
Contributor : Laurent Romary <>
Submitted on : Monday, December 11, 2017 - 12:07:37 PM
Last modification on : Thursday, March 14, 2019 - 11:46:06 AM
Document(s) archivé(s) le : Monday, March 12, 2018 - 12:23:43 PM

File

331967.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-01660143, version 1

Collections

Citation

Jarmo Saarti, Jari Ropponen, Satu Soivanen. The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use. DH. Opportunities and Risks. Connecting Libraries and Research, Aug 2017, Berlin, Germany. ⟨hal-01660143⟩

Share

Metrics

Record views

150

Files downloads

638