Transcribing Southern Min Speech Corpora with a Web-Based Language Learning System

Jun Cai 1, * Jacques Feldmar 1 Yves Laprie 1 Dominique Fohr 1 Jean-Paul Haton 1
* Auteur correspondant
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : The paper proposes a human-computation-based scheme for transcribing Southern Min speech corpora. The core idea is to implement a Web-based language learning system to collect orthographic and phonetic labels from a large amount of language learners and choose the commonly input labels as the transcriptions of the corpora. It is essentially a technology of distributed knowledge acquisition. Some computeraided mechanisms are also used to verify the collected transcriptions. The benefit of the scheme is that it makes the transcribing task neither tedious nor costly. No significant budget should be made for transcribing large corpora. The design of a system for transcribing Min Nan speech corpora is described in detail. The application of a prototype version of the system shows that this transcribing scheme is an effective and economical way
Type de document :
Communication dans un congrès
International Conference on Audio, Language and Image Processing - ICALIP 2008, Jul 2008, Shangai, China. IEEE, 2008
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00336375
Contributeur : Yves Laprie <>
Soumis le : lundi 3 novembre 2008 - 18:03:27
Dernière modification le : jeudi 11 janvier 2018 - 06:19:56
Document(s) archivé(s) le : lundi 7 juin 2010 - 22:40:28

Fichier

CaiJunPaperTranscribingMinSpee...
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : inria-00336375, version 1

Collections

Citation

Jun Cai, Jacques Feldmar, Yves Laprie, Dominique Fohr, Jean-Paul Haton. Transcribing Southern Min Speech Corpora with a Web-Based Language Learning System. International Conference on Audio, Language and Image Processing - ICALIP 2008, Jul 2008, Shangai, China. IEEE, 2008. 〈inria-00336375〉

Partager

Métriques

Consultations de la notice

342

Téléchargements de fichiers

237