A Word Sense Disambiguation Model for Amharic Words using Semi-Supervised Learning Paradigm

Authors

  • Getahun Wassie Wollega University
  • Ramesh Babu P Wollega University
  • Solomon Teferra Addis Ababa University
  • Million Meshesha Addis Ababa University

Keywords:

Ambiguity, Bootstrapping, Word Sense disambiguation

Abstract

The main objective of this research was to design a WSD (word sense disambiguation) prototype model for Amharic words using semi-supervised learning method to extract training sets which minimizes the amount of the required human intervention and it can produce considerable improvement in learning accuracy. Due to the unavailability of Amharic word net, only five words were selected. These words were atena (አጠና), derese (ደረሰ), tenesa (ተነሳ), bela (በላ) and ale (አለ). A separate data sets using five ambiguous words were prepared for the development of this Amharic WSD prototype. The final classification task was done on fully labelled training set using Adaboost, bagging, and AD tree classification algorithms on WEKA package.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Getahun Wassie, Wollega University

College of Engineering and Technology, Wollega University, Post Box No: 395, Nekemte, Ethiopia

Ramesh Babu P, Wollega University

College of Engineering and Technology, Wollega University, Post Box No: 395, Nekemte, Ethiopia

Solomon Teferra, Addis Ababa University

School of Information Science, Addis Ababa University, Post Box No: 1176, Addis Ababa, Ethiopia

Million Meshesha, Addis Ababa University

School of Information Science, Addis Ababa University, Post Box No: 1176, Addis Ababa, Ethiopia

References

Adam Kilgarriff. (2003). Word sense disambiguation and parallel corpora, University of Brighton and Lexicography Master Class.

Alemayehu. (2010). Application of query expansion for Amharic information retrieval system, M.Sc Thesis, Addis Ababa University Ethiopia.

Artur Ferreira. (2007). Survey on Boosting Algorithms for Supervised and Semi-supervised Learning, Institute of telecommunications.

Atelach Alemu Argaw and Lars Asker. (2010). An Amharic Stemmer: Reducing Words to their Citation Forms, M.S Thesis, Department of Computer and Systems Sciences, Stockholm University, Sweden.

Bartosz Broda and Maciej Piasecki. (2013). Semi-supervised Word Sense Disambiguation Based on Weakly Controlled Sense Induction, Proceedings of the International Multi- conference pp. 17–16, Institute of Informatics Wroclaw University of Technology.

Daniel Gochel Agonafer. (2003). An Integrated Approach to Automatic Complex Sentence Parsing For Amharic Text,

M.S Thesis, Department Of Information Science, Addis Ababa University.

Dawit Bekele. (2003). The Development and Dissemination of Ethiopic Standards and Software Localization for Ethiopia, The ICT Capacity Building Programme of the Capacity Building Ministry of the FDRE and United Nations Economic Commission for Africa, Addis Ababa.

Eneko Agirre and Philip Edmonds. (2006). Word Sense Disambiguation: Algorithms and Applications, University of the Basque Country, Sharp Laboratories of Europe Limited.

Eric Bauer, Ron Kohavi. (1998). An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants, Machine Learning Research Centre, Netherlands.

Ethiopian Language Research Institute-Amharic Dictionary Center, Addis Ababa, Ethiopia.

Etienne Grossmann. (2002). ADtree : boosting a weak classifier into a decision tree, University of Kentucky, Quality Street, Suite 412, Lexington, KY, 40507, USA.

Georgios Paliouras., Vangelis Karkaletsis. and Constantine D. Spyropoulos. (2010). Learning Rules for Large Vocabulary Word Sense Disambiguation, Institute of Informatics and Telecommunications, NCSR "Demokritos", Aghia Paraskevi Attikis Athens, Greece.

Gerard Escudero Bakx. (2006). Machine Learning Techniques For Word Sense Disambiguation, PH.D Dissertation, universitat politμecnica de catalunya, Barcelona.

Glenn Fung. (2001). A Comprehensive Overview of Basic Clustering Algorithms.

Hendrik K¨uck., Peter Carbonetto. and Nando de Freitas. (2013). A Constrained Semi-Supervised Learning Approach to Data Association, Dept. of Computer Science, University of British Columbia, Vancouver, Canada.

Lars Asker., Atelach Alemu Argaw., Björn Gambäck and Magnus Sahlgren. (2010). Applying Machine Learning to Amharic Text Classification, Stockholm University and Swedish Institute of Computer Science.

Maurice van Keulen and Mena B. Habib. (2011). Handling uncertainty in information extraction, Ph.D Thesis, University of Twente, Enschede, The Netherlands.

Pavan Kumar Mallapragada. (2010). Some Contributions to Semi-Supervised Learning, M.S.Thesis, Michigan State University.

Qiong Liu., Stephen Levinson., Ying Wu. and Thomas Huang. (2013). Interactive and Incremental Learning via a Mixture of Supervised and Unsupervised Learning Strategies, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign.

Ravi Som Sinha. (2008). Graph-Based Centrality Algorithms For unsupervised Word Sense Disambiguation, M.S. Thesis, University Of North Texas.

Rohan Sharma. (2008). Word Sense Disambiguation for Hindi Language, M.Sc. Thesis, Thapar University, Patiala.

Saba Amsalu Teserra. (2007). Bilingual Word and Chunk Alignment: A Hybrid System for Amharic and English, M.S Thesis, University of Bielefeld.

Samuel Brody. (2009). Closing the Gap in WSD: Supervised Results with Unsupervised Methods., Dissertation, Institute for Communicating and Collaborative Systems, School of Informatics, University of Edinburgh.

Solomon Assemu. (2011). Unsupervised Machine Learning Approach for Word Sense Disambiguation to Amharic Words, M.Sc. Thesis, School of Information Science, Addis Ababa University.

Solomon Mekonen. (2010). Word Sense Disambiguation for Amharic Text: A Machine Learning Approach, M.Sc. Thesis, School of Information Science, Addis Ababa University.

Solomon Teferra Abate and Wolfgang Menzel. (2007). Syllable-Based Speech Recognition for Amharic, University of Hamburg, Department of Informatics. Vogt- Kölln-Strasse. 19, D-22517 Hamburg, Germany.

Suneetha, M., Sameen Fatima, S. (2011). Corpus based Automatic Text Summarization System with HMM Tagger. International Journal of Soft Computing and Engineering 1(3):118-123.

Tewodros Hailemeskel Gebermariam. (2003). Amharic Text Retrieval: An Experiment Using Latent Semantic Indexing (LSI) With Singular Value Decomposition (SVD), M.S Thesis, Department of Information Science ,Addis Ababa University.

Thanh Phong., Pham, Hwee., Tou Ng and Wee Sun Lee. (2012). Word Sense Disambiguation with Semi- Supervised Learning, Department of Computer Science, National University of Singapore.

Thanh Phong., Phamand Hwee., Tou Ng. and Wee Sun Lee. (2011). Word Sense Disambiguation with Semi- Supervised Learning, Department of Computer Science, National University of Singapore.

Xiaohua Zhou and Hyoil Han. (2008). Survey of Word Sense Disambiguation Approaches, Drexel University, 3401 Chestnut Street, Philadelphia, PA 19104.

Xiaojin Zhu. (2008). Semi-Supervised Learning Literature Survey, Computer Sciences TR 1519, University of Wisconsin – Madison.

Yacob, D (1996). System for Ethiopic Representation in ASCII (SERA). [cited 2012 Accessed on April 12]; Available from: http://www.abyssiniacybergateway.net/fidel/

Yarowsky D. (2008). Unsupervised word sense, disambiguation rivaling supervised methods, Cambridge, MA: ACL-47,1947, pp. 144-196.

Downloads

Published

29.09.2014

How to Cite

Wassie, G., Babu P, R., Teferra, S., & Meshesha, M. (2014). A Word Sense Disambiguation Model for Amharic Words using Semi-Supervised Learning Paradigm. Journal of Science, Technology and Arts Research, 3(3), 147–155. Retrieved from https://journals.wgu.edu.et/index.php/star/article/view/559

Issue

Section

Original Research

Categories

Plaudit