Skip to Main content Skip to Navigation
Conference papers

TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries

Abstract : This paper presents the application of GROBID-Dictionaries (Khemakhem et al. 2017, Khemakhem et al. 2018a, Khemakhem et al. 2018b, Khemakhem et al. 2018c), an open source machine learning system for automatically structuring print dictionaries in digital format into TEI (Text Encoding Initiative) to a historical lexical resource of Colonial Mixtec 'Voces del Dzaha Dzahui' published by the Dominican fray Francisco Alvarado in the year 1593. The GROBID-Dictionaries application was applied to a reorganized and modernized version of the historical resource published by Jansen and Perez Jiménez (2009). The TEI dictionary produced will be integrated into a language documentation project dealing with Mixtepec-Mixtec (ISO 639-3: mix) (Bowers & Romary, 2017, 2018a, 2018b) an under-resourced indigenous language native to the Juxtlahuaca district of Oaxaca Mexico.
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download
Contributor : Laurent Romary Connect in order to contact the contributor
Submitted on : Tuesday, August 6, 2019 - 10:50:40 AM
Last modification on : Friday, January 21, 2022 - 3:16:39 AM
Long-term archiving on: : Thursday, January 9, 2020 - 1:39:10 AM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-02264033, version 1



Jack Bowers, Mohamed Khemakhem, Laurent Romary. TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries. ELEX 2019: Smart Lexicography, Oct 2019, Sintra, Portugal. ⟨hal-02264033⟩



Les métriques sont temporairement indisponibles