Extracting Definienda in Mathematical Scholarly Articles with Transformers - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Extracting Definienda in Mathematical Scholarly Articles with Transformers

Shufan Jiang

Résumé

We consider automatically identifying the defined term within a mathematical definition from the text of an academic article. Inspired by the development of transformer-based natural language processing applications, we pose the problem as (a) a token-level classification task using fine-tuned pre-trained transformers; and (b) a question-answering task using a generalist large language model (GPT). We also propose a rule-based approach to build a labeled dataset from the L A T E X source of papers. Experimental results show that it is possible to reach high levels of precision and recall using either recent (and expensive) GPT 4 or simpler pre-trained models fine-tuned on our task.
Fichier principal
Vignette du fichier
jiang2023extracting.pdf (212.3 Ko) Télécharger le fichier
Paper_6_WIESP2023.pdf (204.24 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04282533 , version 1 (20-11-2023)

Identifiants

Citer

Shufan Jiang, Pierre Senellart. Extracting Definienda in Mathematical Scholarly Articles with Transformers. The 2nd Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023, Nov 2023, Online, Indonesia. ⟨hal-04282533⟩
37 Consultations
14 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More