Extracting Definienda in Mathematical Scholarly Articles with Transformers

Shufan Jiang; Pierre Senellart

Communication Dans Un Congrès Année : 2023

Extracting Definienda in Mathematical Scholarly Articles with Transformers

(1) , (2, 1)

1
2

Shufan Jiang

Fonction : Auteur
PersonId : 184063
IdHAL : shufan-jiang
ORCID : 0000-0002-8486-3158

Value from Data

Pierre Senellart

Fonction : Auteur
PersonId : 11778
IdHAL : pierre-senellart
ORCID : 0000-0002-7909-5369
IdRef : 124713769

Département d'informatique - ENS Paris

Value from Data

Résumé

We consider automatically identifying the defined term within a mathematical definition from the text of an academic article. Inspired by the development of transformer-based natural language processing applications, we pose the problem as (a) a token-level classification task using fine-tuned pre-trained transformers; and (b) a question-answering task using a generalist large language model (GPT). We also propose a rule-based approach to build a labeled dataset from the L A T E X source of papers. Experimental results show that it is possible to reach high levels of precision and recall using either recent (and expensive) GPT 4 or simpler pre-trained models fine-tuned on our task.

Domaines

Traitement du texte et du document Intelligence artificielle [cs.AI]

Fichier principal

jiang2023extracting.pdf (212.3 Ko)

Paper_6_WIESP2023.pdf (204.24 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Shufan JIANG : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04282533

Soumis le : lundi 20 novembre 2023-16:04:10

Dernière modification le : vendredi 19 avril 2024-16:18:56

Dates et versions

hal-04282533 , version 1 (20-11-2023)

Identifiants

HAL Id : hal-04282533 , version 1
ARXIV : 2311.12448

Citer

Shufan Jiang, Pierre Senellart. Extracting Definienda in Mathematical Scholarly Articles with Transformers. The 2nd Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023, Nov 2023, Online, Indonesia. ⟨hal-04282533⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL ANR PRAIRIE-IA

37 Consultations

14 Téléchargements

Extracting Definienda in Mathematical Scholarly Articles with Transformers

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager