TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources

Résumé

Achieving consistent encoding within a given community of practice has been a recurrent issue for the TEI Guidelines. The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. Still, the encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources. Following the spirit of TEI Analytics [Zillig, 2009], developed in the context of the MONK project, TEI Lex-0 aims at establishing a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such [Ermolaev and Tasovac, 2012] and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers. The format itself should not necessarily be one which is used for editing or managing individual resources, but one to which they can be univocally transformed to be queried, visualised, or mined in a uniform way. We are also aiming to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard so that coherent design guidelines can be provided to the community (cf. [Romary, 2015]). The paper will provide an overview of the various domains covered by TEI Lex- 0 and the main decisions that were taken over the last 18 months: constraining the general structure of a lexical entry; offering mechanisms to overcome the limits of when used in retro-digitized dictionaries (by allowing, for instance, and as children of ); systematizing the representation of morpho-syntactic information [Bański et al., 2017]; providing a strict -based encoding of sense-related information; deprecating ; dealing with internal and external references in dictionary entries, providing more advanced encodings of etymology (see submission by Bowers, Herold and Romary); as well as defining technical constraints on the systematic use of @xml:id at different levels of the dictionary microstructure. The activity of the group has already lead to changes in the Guidelines in response to specific GitHub tickets.
Fichier principal
Vignette du fichier
TEI Lex 0 pres - Tasovac_rev.pdf (568.12 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02265312 , version 1 (09-08-2019)

Licence

Paternité

Identifiants

  • HAL Id : hal-02265312 , version 1

Citer

Laurent Romary, Toma Tasovac. TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources. TEI Conference and Members' Meeting, Sep 2018, Tokyo, Japan. ⟨hal-02265312⟩

Collections

INRIA INRIA2
236 Consultations
319 Téléchargements

Partager

Gmail Facebook X LinkedIn More