Skip to Main content Skip to Navigation
Documents associated with scientific events

Wake up, standOff!

Abstract : The paper provides an overview of and an update on the on-going proposal to create a component within the TEI architecture. It elicits the conceptual background of having stand-off annotations embedded within a TEI document and the consequences in terms of primary source preservation, multiple annotation views and possible exporting of annotation content into autonomous TEI documents. It demonstrates the various types of possible use cases ranging from manual annotation to fully automatized information extraction processes and show the importance of implementing, right from the onset, the possibility to use any kind of internal or external vocabulary for representing annotation bodies (e.g. to deal with structural or conceptual annotations). An important prospect here is that the construct could lead to a simplified development of TEI-aware online services such as Named Entity Recognisers. We relate to on-going initiatives and show the necessity to align with the Web Annotation Data Model (W3C) as well as with the recent introduction of the element for speech transcription (as part of the work carried out in the ISO standard 24624) as an elementary annotation crystal in the sense of Romary and Wegstein (2012). In this context we tackle the issue of implicitness in the representation of annotations and open the debate related to the trade-off between having a terse vs. highly flexible model. We end up by illustrating the application that is already made of the current proposal in various projects related to data mining or scientific information, and in particular to the representation of annotated scholarly content. Further materials •Minutes of the January 2014 meeting:,%2001.2014/standoff-minutesBerlin2014.pdf •The TEI GitHub ticket: •The standOff proposal on GitHub: (branch AnnArbor) References Bański Piotr (2010). Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless. In Proceedings of Balisage: The Markup Conference, 2010. Vol. 5 of Balisage Series on Markup Technologies ISO/DIS 24624 Language resource management -- Transcription of spoken language Pose Javier, Patrice Lopez and Laurent Romary (2014). A Generic Formalism for Encoding Stand-off annotations in TEI. 2014. Romary Laurent (2015). TEI challenges in an accelerating digital world. DiXiT Convention week, Sep 2015, The Hague, Netherlands. 2015, . Romary Laurent and Werner Wegstein (2012), « Consistent Modeling of Heterogeneous Lexical Structures », Journal of the Text Encoding Initiative [Online], Issue 3 | November 2012, Online since 15 October 2012, connection on 12 May 2016. URL : ; DOI : 10.4000/jtei.540 (section about Crystals : Web Annotation Data Model, W3C,
Document type :
Documents associated with scientific events
Complete list of metadata
Contributor : Laurent Romary Connect in order to contact the contributor
Submitted on : Thursday, September 29, 2016 - 5:05:38 PM
Last modification on : Thursday, October 28, 2021 - 9:42:09 AM
Long-term archiving on: : Friday, December 30, 2016 - 2:43:29 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-01374102, version 1


Piotr Banski, Bertrand Gaiffe, Patrice Lopez, Simon Meoni, Laurent Romary, et al.. Wake up, standOff!. TEI Conference 2016, Sep 2016, Vienna, Austria. ⟨hal-01374102⟩



Record views


Files downloads