Wake up, standOff!

Abstract : The paper provides an overview of and an update on the on-going proposal to create a component within the TEI architecture. It elicits the conceptual background of having stand-off annotations embedded within a TEI document and the consequences in terms of primary source preservation, multiple annotation views and possible exporting of annotation content into autonomous TEI documents. It demonstrates the various types of possible use cases ranging from manual annotation to fully automatized information extraction processes and show the importance of implementing, right from the onset, the possibility to use any kind of internal or external vocabulary for representing annotation bodies (e.g. to deal with structural or conceptual annotations). An important prospect here is that the construct could lead to a simplified development of TEI-aware online services such as Named Entity Recognisers. We relate to on-going initiatives and show the necessity to align with the Web Annotation Data Model (W3C) as well as with the recent introduction of the element for speech transcription (as part of the work carried out in the ISO standard 24624) as an elementary annotation crystal in the sense of Romary and Wegstein (2012). In this context we tackle the issue of implicitness in the representation of annotations and open the debate related to the trade-off between having a terse vs. highly flexible model. We end up by illustrating the application that is already made of the current proposal in various projects related to data mining or scientific information, and in particular to the representation of annotated scholarly content. Further materials •Minutes of the January 2014 meeting: http://download2.polytechnic.edu.na/pub7/sourceforge/l/li/lingsig/Documents/Standoff%20in%20Berlin,%2001.2014/standoff-minutesBerlin2014.pdf •The TEI GitHub ticket: https://github.com/TEIC/TEI/issues/374 •The standOff proposal on GitHub: https://github.com/laurentromary/stdfSpec (branch AnnArbor) References Bański Piotr (2010). Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless. In Proceedings of Balisage: The Markup Conference, 2010. Vol. 5 of Balisage Series on Markup Technologies ISO/DIS 24624 Language resource management -- Transcription of spoken language Pose Javier, Patrice Lopez and Laurent Romary (2014). A Generic Formalism for Encoding Stand-off annotations in TEI. 2014. Romary Laurent (2015). TEI challenges in an accelerating digital world. DiXiT Convention week, Sep 2015, The Hague, Netherlands. 2015, . Romary Laurent and Werner Wegstein (2012), « Consistent Modeling of Heterogeneous Lexical Structures », Journal of the Text Encoding Initiative [Online], Issue 3 | November 2012, Online since 15 October 2012, connection on 12 May 2016. URL : http://jtei.revues.org/540 ; DOI : 10.4000/jtei.540 (section about Crystals : https://jtei.revues.org/540#tocfrom2n1) Web Annotation Data Model, W3C, https://www.w3.org/TR/annotation-model/
Document type :
Documents associated with scientific events
Liste complète des métadonnées

https://hal.inria.fr/hal-01374102
Contributor : Laurent Romary <>
Submitted on : Thursday, September 29, 2016 - 5:05:38 PM
Last modification on : Thursday, April 4, 2019 - 1:29:49 AM
Document(s) archivé(s) le : Friday, December 30, 2016 - 2:43:29 PM

Files

WakeUpStandOff.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-01374102, version 1

Citation

Piotr Banski, Bertrand Gaiffe, Patrice Lopez, Simon Meoni, Laurent Romary, et al.. Wake up, standOff!. TEI Conference 2016, Sep 2016, Vienna, Austria. ⟨http://tei2016.acdh.oeaw.ac.at⟩. ⟨hal-01374102⟩

Share

Metrics

Record views

624

Files downloads

167