Skip to Main content Skip to Navigation
Journal articles

MasakhaNER: Named entity recognition for African languages

David Ifeoluwa Adelani 1, 2 Jade Abbott 3, 2 Graham Neubig 4 Daniel d'Souza 5, 2 Julia Kreutzer 6, 2 Constantine Lignos 7, 2 Chester Palen-Michel 7, 2 Happy Buzaaba 8, 2 Shruti Rijhwani 4 Sebastian Ruder 9 Stephen Mayhew 10 Israel Abebe Azime 11, 2 Shamsuddeen H Muhammad 12, 13, 2 Chris Chinenye Emezue 14, 2 Joyce Nakatumba-Nabende 15, 2 Perez Ogayo 16, 2 Anuoluwapo Aremu 17, 2 Catherine Gitau 2 Derguene Mbaye 2 Jesujoba Alabi 18, 2 Seid Muhie Yimam 19 Tajuddeen Rabiu Gwadabe 20, 2 Ignatius Ezeani 21, 2 Rubungo Andre Niyongabo 22, 2 Jonathan Mukiibi 15 Verrah Otiende 23, 2 Iroro Orife 24 Davis David 2 Samba Ngom 2 Tosin Adewumi 25, 2 Paul Rayson 21 Mofetoluwa Adeyemi 2 Gerald Muriuki 15 Emmanuel Anebi 2 Chiamaka Chukwuneke 21 Nkiruka Odu 26 Eric Peter Wairagala 15 Samuel Oyerinde 2 Clemencia Siro 2 Tobius Saul Bateesa 15 Temilola Oloyede 2 Yvonne Wambui 2 Victor Akinode 2 Deborah Nabagereka 15 Maurice Katusiime 15 Ayodele Awokoya 27, 2 Mouhamadane Mboup 2 Dibora Gebreyohannes 2 Henok Tilaye 2 Kelechi Nwaike 2 Degaga Wolde 2 Abdoulaye Faye 2 Blessing Sibanda 28, 2 Orevaoghene Ahia 29, 2 Bonaventure F P Dossou 30, 2 Kelechi Ogueji 31, 2 Ibrahima Thierno 2 Abdoulaye Diallo 2 Adewale Akinfaderin 2 Tendai Marengereke 2 Salomey Osei 11, 2 
Abstract : We take a step towards addressing the underrepresentation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of stateof-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP. 1
Document type :
Journal articles
Complete list of metadata

https://hal.inria.fr/hal-03350962
Contributor : Emmanuel Vincent Connect in order to contact the contributor
Submitted on : Tuesday, September 21, 2021 - 5:32:17 PM
Last modification on : Friday, August 5, 2022 - 2:33:27 PM
Long-term archiving on: : Wednesday, December 22, 2021 - 7:20:09 PM

File

adelani_TACL2021.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel d'Souza, Julia Kreutzer, et al.. MasakhaNER: Named entity recognition for African languages. Transactions of the Association for Computational Linguistics, The MIT Press, 2021, ⟨10.1162/tacl⟩. ⟨hal-03350962⟩

Share

Metrics

Record views

47

Files downloads

58