Abstract : This paper presents the application of GROBID-Dictionaries (Khemakhem et al. 2017, Khemakhem et al. 2018a, Khemakhem et al. 2018b, Khemakhem et al. 2018c), an open source machine learning system for automatically structuring print dictionaries in digital format into TEI (Text Encoding Initiative) to a historical lexical resource of Colonial Mixtec 'Voces del Dzaha Dzahui' published by the Dominican fray Francisco Alvarado in the year 1593. The GROBID-Dictionaries application was applied to a reorganized and modernized version of the historical resource published by Jansen and Perez Jiménez (2009). The TEI dictionary produced will be integrated into a language documentation project dealing with Mixtepec-Mixtec (ISO 639-3: mix) (Bowers & Romary, 2017, 2018a, 2018b) an under-resourced indigenous language native to the Juxtlahuaca district of Oaxaca Mexico.
https://hal.inria.fr/hal-02264033
Contributor : Laurent Romary <>
Submitted on : Tuesday, August 6, 2019 - 10:50:40 AM Last modification on : Friday, September 18, 2020 - 2:34:45 PM Long-term archiving on: : Thursday, January 9, 2020 - 1:39:10 AM
Jack Bowers, Mohamed Khemakhem, Laurent Romary. TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries. ELEX 2019: Smart Lexicography, Oct 2019, Sintra, Portugal. ⟨hal-02264033⟩