Abstract : Manuscript Sales Catalogues (MSC) are highly important for authenticating documents and studying the reception of authors. Their regular publication throughout Europe since the beginning of the 19th c. has consequently raised the interest around scaling up the means for
automatically structuring their contents. Following successful first encoding tests with GROBID-Dictionaries [1,2] on a single MSC collection [3], we aim in this paper to present the results of more advanced tests of the system’s capacity to handle a larger corpus with MSC ofdifferent dealers, and therefore multiple layouts.
Lucie Rondeau Du Noyer, Simon Gabay, Mohamed Khemakhem, Laurent Romary. Scaling up Automatic Structuring of Manuscript Sales Catalogues. TEI 2019: What is text, really? TEI and beyond, Sep 2019, Graz, Austria. ⟨hal-02272962⟩