Vietnamese Sentence Similarity Based on Concepts

Hien T. Nguyen; Phuc H. Duong; Vinh T. Vo

doi:10.1007/978-3-662-45237-0_24

Communication Dans Un Congrès Année : 2014

Vietnamese Sentence Similarity Based on Concepts

(1) , (1) , (1)

Hien T. Nguyen

Fonction : Auteur
PersonId : 994842

Ton Duc Thang University [Hô-Chi-Minh-City]

Phuc H. Duong

Fonction : Auteur
PersonId : 994892

Ton Duc Thang University [Hô-Chi-Minh-City]

Vinh T. Vo

Fonction : Auteur
PersonId : 994893

Ton Duc Thang University [Hô-Chi-Minh-City]

Résumé

We propose a novel method for measuring semantic similarity of two sentences. The originality of the method is the way that it explores the similarity of concepts referred to in the sentences using Wikipedia. The method also exploits Wiktionary to measure word-to-word similarity. The overall semantic similarity is a linear combination of word-to-word similarity, word-order similarity, and concept similarity. We build datasets consisting of 45 Vietnamese sentence pairs and then evaluate the method on these datasets. The results show that in the best cases, concept similarity help improving the performance of our method more than 15% point. The proposed method is language-independent and quite easy to employ. Therefore, one can readily adopt our method to measure semantic similarity for sentences written in other languages.

Mots clés

Paraphrase Identification Text Similarity Semantic Similarity

Domaines

Informatique [cs] Sciences de l'information et de la communication

Fichier principal

978-3-662-45237-0_24_Chapter.pdf (498.33 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01405592

Soumis le : mercredi 30 novembre 2016-11:01:30

Dernière modification le : jeudi 1 décembre 2016-01:04:16

Archivage à long terme le : lundi 27 mars 2017-08:49:02

Dates et versions

hal-01405592 , version 1 (30-11-2016)

Licence

Paternité

Identifiants

HAL Id : hal-01405592 , version 1
DOI : 10.1007/978-3-662-45237-0_24

Citer

Hien T. Nguyen, Phuc H. Duong, Vinh T. Vo. Vietnamese Sentence Similarity Based on Concepts. 13th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Nov 2014, Ho Chi Minh City, Vietnam. pp.243-253, ⟨10.1007/978-3-662-45237-0_24⟩. ⟨hal-01405592⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC8 IFIP-LNCS-8838 IFIP-CISIM

179 Consultations

594 Téléchargements

Vietnamese Sentence Similarity Based on Concepts

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager