Skip to Main content Skip to Navigation
New interface
Conference papers

Building A Corporate Corpus For Threads Constitution

Abstract : In this paper we describe the process of building a corporate corpus that will be used as a reference for modelling and computing threads from conversations generated using communication and collaboration tools. The overall goal of the reconstruction of threads is to be able to provide value to the collorator in various use cases, such as higlighting the important parts of a running discussion, reviewing the upcoming commitments or deadlines, etc. Since, to our knowledge, there is no available corporate corpus for the French language which could allow us to address this problem of thread constitution, we present here a method for building such corpora including different aspects and steps which allowed the creation of a pipeline to pseudo-anonymise data. Such a pipeline is a response to the constraints induced by the General Data Protection Regulation GDPR in Europe and the compliance to the secrecy of correspondence.
Complete list of metadata
Contributor : Lionel TADONFOUET TADJOU Connect in order to contact the contributor
Submitted on : Wednesday, September 22, 2021 - 2:14:33 PM
Last modification on : Wednesday, June 8, 2022 - 12:50:06 PM
Long-term archiving on: : Thursday, December 23, 2021 - 6:42:52 PM


Building a corporate corpus fo...
Files produced by the author(s)


  • HAL Id : hal-03351533, version 1



Lionel Tadonfouet Tadjou, Fabrice Bourge, Tiphaine Marie, Laurent Romary, Eric Villemonte de La Clergerie. Building A Corporate Corpus For Threads Constitution. Student Research Workshop associated with the International Conference on Recent Advances in Natural Language Processing (RANLP’2021), Sep 2021, Online, Bulgaria. ⟨hal-03351533⟩



Record views


Files downloads