Building a Named Entity Recognition model for Ethiopian Languages: a comparative analysis of composite feature embedding
DOI:
https://doi.org/10.20372/mwu.jessd.2025.15671Keywords:
Named Entity Recognition, Amharic, conditional random field, Recurrent neural network, Long short term, Convolutional neural networkAbstract
Named Entity Recognition (NER) is a crucial and indispensable step in information extraction, machine translation, and question-and-answering systems across various languages. The selection and encoding of input features play a significant role in determining the quality of NER by generating semantic and grammatical representation vectors. However, the existing NER models insufficient when it comes to handling new and unseen entity types in the expanding Amharic digital data. Therefore, there is extensive research focused on developing more effective and accurate NER models. In this context, we propose a deep learning NER model that effectively represents word tokens through a combinatorial feature embedding design. We conducted a comparative analysis with existing models for Ethiopian languages. The word vectors created for all tokens using an unsupervised learning algorithm are merged with a set of language-independent features specifically developed for this purpose. These combined features are then fed into a neural network model to predict word classes. Empirical results obtained from the Ethiopian language dataset demonstrate that incorporating character-level word embeddings along with other features in BiLSTM-CRF models yields state-of-the-art performance. In addition to showing the model's ability to generalize to different languages, we evaluated its performance and achieved remarkable accuracy rates: 92.88% and 82.35% on the AM_NER and Oro_NER datasets, respectively.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Equity in Sciences and Sustainable Development

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.