Scaling up Automatic Structuring of Manuscript Sales Catalogues

Abstract : Manuscript Sales Catalogues (MSC) are highly important for authenticating documents and studying the reception of authors. Their regular publication throughout Europe since the beginning of the 19th c. has consequently raised the interest around scaling up the means for automatically structuring their contents. Following successful first encoding tests with GROBID-Dictionaries [1,2] on a single MSC collection [3], we aim in this paper to present the results of more advanced tests of the system’s capacity to handle a larger corpus with MSC ofdifferent dealers, and therefore multiple layouts.
Complete list of metadatas

Cited literature [7 references]  Display  Hide  Download

https://hal.inria.fr/hal-02272962
Contributor : Laurent Romary <>
Submitted on : Wednesday, August 28, 2019 - 1:32:08 PM
Last modification on : Monday, September 2, 2019 - 3:56:42 PM

Files

Grobid Catalogues TEI 2019.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-02272962, version 1

Collections

Citation

Lucie Rondeau Du Noyer, Simon Gabay, Mohamed Khemakhem, Laurent Romary. Scaling up Automatic Structuring of Manuscript Sales Catalogues. TEI 2019: What is text, really? TEI and beyond, Sep 2019, Graz, Austria. ⟨hal-02272962⟩

Share

Metrics

Record views

134

Files downloads

275