TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries

Abstract : This paper presents the application of GROBID-Dictionaries (Khemakhem et al. 2017, Khemakhem et al. 2018a, Khemakhem et al. 2018b, Khemakhem et al. 2018c), an open source machine learning system for automatically structuring print dictionaries in digital format into TEI (Text Encoding Initiative) to a historical lexical resource of Colonial Mixtec 'Voces del Dzaha Dzahui' published by the Dominican fray Francisco Alvarado in the year 1593. The GROBID-Dictionaries application was applied to a reorganized and modernized version of the historical resource published by Jansen and Perez Jiménez (2009). The TEI dictionary produced will be integrated into a language documentation project dealing with Mixtepec-Mixtec (ISO 639-3: mix) (Bowers & Romary, 2017, 2018a, 2018b) an under-resourced indigenous language native to the Juxtlahuaca district of Oaxaca Mexico.
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/hal-02264033
Contributor : Laurent Romary <>
Submitted on : Tuesday, August 6, 2019 - 10:50:40 AM
Last modification on : Thursday, August 8, 2019 - 1:10:13 AM

File

eLex_2019_abstract_111.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-02264033, version 1

Collections

Citation

Jack Bowers, Mohamed Khemakhem, Laurent Romary. TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries. ELEX 2019: Smart Lexicography, Oct 2019, Sintra, Portugal. ⟨hal-02264033⟩

Share

Metrics

Record views

82

Files downloads

204