TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources

Abstract : Achieving consistent encoding within a given community of practice has been a recurrent issue for the TEI Guidelines. The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. Still, the encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources. Following the spirit of TEI Analytics [Zillig, 2009], developed in the context of the MONK project, TEI Lex-0 aims at establishing a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such [Ermolaev and Tasovac, 2012] and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers. The format itself should not necessarily be one which is used for editing or managing individual resources, but one to which they can be univocally transformed to be queried, visualised, or mined in a uniform way. We are also aiming to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard so that coherent design guidelines can be provided to the community (cf. [Romary, 2015]). The paper will provide an overview of the various domains covered by TEI Lex- 0 and the main decisions that were taken over the last 18 months: constraining the general structure of a lexical entry; offering mechanisms to overcome the limits of when used in retro-digitized dictionaries (by allowing, for instance, and as children of ); systematizing the representation of morpho-syntactic information [Bański et al., 2017]; providing a strict -based encoding of sense-related information; deprecating ; dealing with internal and external references in dictionary entries, providing more advanced encodings of etymology (see submission by Bowers, Herold and Romary); as well as defining technical constraints on the systematic use of @xml:id at different levels of the dictionary microstructure. The activity of the group has already lead to changes in the Guidelines in response to specific GitHub tickets.
Complete list of metadatas

https://hal.inria.fr/hal-02265312
Contributor : Laurent Romary <>
Submitted on : Friday, August 9, 2019 - 12:39:54 PM
Last modification on : Saturday, August 10, 2019 - 1:15:31 AM
Long-term archiving on: Thursday, January 9, 2020 - 11:35:20 PM

Files

TEI Lex 0 pres - Tasovac_rev.p...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-02265312, version 1

Collections

Citation

Laurent Romary, Toma Tasovac. TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources. TEI Conference and Members' Meeting, Sep 2018, Tokyo, Japan. ⟨hal-02265312⟩

Share

Metrics

Record views

47

Files downloads

227