A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon

Résumé

Recently, we have investigated the use of Arabic linguistic knowledge to improve the recognition of wide Arabic word lexicon. A neural-linguistic approach was proposed to mainly deal with canonical vocabulary of decomposable words derived from tri-consonant healthy roots. The basic idea is to factorize words by their roots and schemes. In this direction, we conceived two neural networks TNN_R and TNN_S to respectively recognize roots and schemes from structural primitives of words. The proposal approach achieved promising results. In this paper, we will focus on how to reach better results in terms of accuracy and recognition rate. Current improvements concern especially the training stage. It is about 1) to benefit from word letters order 2) to consider "sisters letters" (having same features), 3) to supervise networks behaviours, 4) to split up neurons to save letter occurrences and 5) to solve observed ambiguities. Considering theses improvements, experiments carried on 1500 sized vocabulary show a significant enhancement: TNN_R (resp. TNN_S) top4 has gone up from 77% to 85.8% (resp. from 65% to 97.9%). Enlarging the vocabulary from 1000 to 1700 by 100 words, again confirmed the results without altering the networks stability.
Fichier principal
Vignette du fichier
Imen-DRR2010.pdf (211.02 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00579680 , version 1 (24-03-2011)

Identifiants

  • HAL Id : inria-00579680 , version 1

Citer

Imen Ben Cheikh, Afef Kacem, Abdel Belaïd. A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon. Document Recognition and Retrieval XVII, Jan 2010, San Jose, United States. ⟨inria-00579680⟩
155 Consultations
213 Téléchargements

Partager

Gmail Facebook X LinkedIn More