Evaluation of Different Stemming Techniques on Arabic Customer Reviews
DOI:
https://doi.org/10.51173/jt.v6i2.2313Keywords:
NLP, KNN, NB, LR, Snowball Stemmer, Khoja Stemmer, Tashaphyne StemmerAbstract
Customer opinion and reviews play a vital role in marketing expansion. Big companies all over the world assign a lot of their efforts to analyzing customers’ feedback to keep track of their needs. Natural Language Processing (NLP) is widely used to analyze such review texts. Arabic customer analysis and classification also began to gain researchers’ attention due to the wide range of Arabic language speakers. Working with Arabic Language is a very challenging task because of the orthographic nature of Arabic. Also, customers often write their reviews in their dialectical style, which often diverts from standard Arabic. This study presents a method to classify Arabic customer reviews using four classifiers (K-nearest Neighbor (KNN), Support Vector Machine (SVM), Logistic Regression (RL), and Naïve Bayes (NB)). The classification is implemented with three stemming techniques (Snowball, Khoja, and Tashaphyne). The HARD dataset is adopted to perform the experiments. The results stated that the stemming methods can enhance classification performance despite the complexity of Arabic scripts and dialects. This work sheds light on utilizing and investigating more machine learning (ML) techniques and evaluating the results.
Downloads
References
H. Elzayady, K. M. Badran, and G. I. Salama, “Arabic Opinion Mining Using Combined CNN - LSTM Models,” International Journal of Intelligent Systems and Applications, vol. 12, no. 4, pp. 25–36, 2020, doi: 10.5815/ijisa.2020.04.03.
H. H. Do, P. W. C. Prasad, A. Maag, and A. Alsadoon, “Deep learning for aspect-based sentiment analysis: a comparative review,” Expert Syst Appl, vol. 118, pp. 272–299, 2019.
M. B. Ressan and R. F. Hassan, “Naïve-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 1, 2022, doi: 10.11591/ijeecs.v28.i1.pp375-383.
R. A. Bagate and R. Suguna, “Sarcasm detection of tweets without #sarcasm: Data science approach,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 23, no. 2, 2021, doi: 10.11591/ijeecs.v23.i2.pp993-1001.
M. O. Hegazi, Y. Al-Dossari, A. Al-Yahy, A. Al-Sumari, and A. Hilal, “Preprocessing Arabic text on social media,” Heliyon, vol. 7, no. 2, 2021, doi: 10.1016/j.heliyon.2021.e06191.
I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, “Arabic natural language processing: An overview,” Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 5. 2021. doi: 10.1016/j.jksuci.2019.02.006.
R. Obiedat, D. Al-Darras, E. Alzaghoul, and O. Harfoushi, “Arabic Aspect-Based Sentiment Analysis: A Systematic Literature Review,” IEEE Access, vol. 9. 2021. doi: 10.1109/ACCESS.2021.3127140.
N. Boudad, R. Faizi, R. Oulad Haj Thami, and R. Chiheb, “Sentiment analysis in Arabic: A review of the literature,” Ain Shams Engineering Journal, vol. 9, no. 4. 2018. doi: 10.1016/j.asej.2017.04.007.
H. Elzayady, M. S. Mohamed, K. M. Badran, and G. I. Salama, “Detecting Arabic textual threats in social media using artificial intelligence: An overview,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 25, no. 3, 2022, doi: 10.11591/ijeecs.v25.i3.pp1712-1722.
S. Larabi Marie-Sainte, N. Alalyani, S. Alotaibi, S. Ghouzali, and I. Abunadi, “Arabic Natural Language Processing and Machine Learning-Based Systems,” IEEE Access, vol. 7, pp. 7011–7020, 2019, doi: 10.1109/ACCESS.2018.2890076.
M. F. Ibrahim and A. Al-Taei, Title-Based Document Classification for Arabic Theses and Dissertations, vol. 318. 2022. doi: 10.1007/978-981-16-5689-7_17.
M. O. Hegazi, Y. Al-Dossari, A. Al-Yahy, A. Al-Sumari, and A. Hilal, “Preprocessing Arabic text on social media,” Heliyon, vol. 7, no. 2, p. e06191, 2021, doi: 10.1016/j.heliyon.2021.e06191.
P. Hajek, L. Hikkerova, and J.-M. Sahut, “Fake review detection in e-Commerce platforms using aspect-based sentiment analysis,” J Bus Res, vol. 167, p. 114143, 2023.
H. M. Abdelaal, A. N. Elmahdy, A. A. Halawa, and H. A. Youness, “Improve the automatic classification accuracy for Arabic tweets using ensemble methods,” Journal of Electrical Systems and Information Technology, vol. 5, no. 3, pp. 363–370, 2018.
D. H. Abd, A. T. Sadiq, and A. R. Abbas, “Classifying political arabic articles using support vector machine with different feature extraction,” in International Conference on Applied Computing to Support Industry: Innovation and Technology, Springer, 2019, pp. 79–94.
D. H. Abd, W. Khan, B. Khan, N. Alharbe, D. Al-Jumeily, and A. Hussain, “Categorization of Arabic posts using Artificial Neural Network and hash features,” J King Saud Univ Sci, vol. 35, no. 6, p. 102733, 2023, doi: 10.1016/j.jksus.2023.102733.
J. K. Alwan, A. J. Hussain, D. H. Abd, A. T. Sadiq, M. Khalaf, and P. Liatsis, “Political Arabic Articles Orientation Using Rough Set Theory with Sentiment Lexicon,” IEEE Access, vol. 9, pp. 24475–24484, 2021, doi: 10.1109/ACCESS.2021.3054919.
S. M. Alzanin, A. M. Azmi, and H. A. Aboalsamh, “Short text classification for Arabic social media tweets,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 9, pp. 6595–6604, 2022.
G. F. Issa*, M. Abu-Arqoub*, and W. M. Hadi, “The Impact of Feature Selection Methods for Classifying Arabic Textual Data,” International Journal of Recent Technology and Engineering (IJRTE), vol. 8, no. 4, pp. 1333–1338, 2019, doi: 10.35940/ijrte.d7163.118419.
A. M. Bdeir and F. Ibrahim, “A framework for arabic tweets multi-label classification using word embedding and neural networks algorithms,” in Proceedings of the 2020 2nd International Conference on Big Data Engineering, 2020, pp. 105–112.
A. Elnagar, Y. S. Khalifa, and A. Einea, “Hotel Arabic-reviews dataset construction for sentiment analysis applications,” Intelligent natural language processing: Trends and applications, pp. 35–52, 2018.
H. El Rifai, L. Al Qadi, and A. Elnagar, “Arabic text classification: the need for multi-labeling systems,” Neural Comput Appl, vol. 34, no. 2, 2022, doi: 10.1007/s00521-021-06390-z.
R. M. K. Saeed, S. Rady, and T. F. Gharib, “An ensemble approach for spam detection in Arabic opinion texts,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 1, 2022, doi: 10.1016/j.jksuci.2019.10.002.
H. Chouikhi, H. Chniter, and F. Jarray, “Arabic Sentiment Analysis Using BERT Model,” in Communications in Computer and Information Science, 2021. doi: 10.1007/978-3-030-88113-9_50.
A. Elnagar, R. Al-Debsi, and O. Einea, “Arabic text classification using deep learning models,” Inf Process Manag, vol. 57, no. 1, 2020, doi: 10.1016/j.ipm.2019.102121.
M. Abdul-Mageed, A. R. Elmadany, and E. M. B. Nagoudi, “ARBERT & MARBERT: Deep bidirectional transformers for Arabic,” in ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2021. doi: 10.18653/v1/2021.acl-long.551.
Y. S. and E. A. Elnagar Ashraf and Khalifa, “Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications,” in Intelligent Natural Language Processing: Trends and Applications, A. E. and T. F. Shaalan Khaled and Hassanien, Ed., Cham: Springer International Publishing, 2018, pp. 35–52. doi: 10.1007/978-3-319-67056-0_3.
M. E. M. Abo, R. G. Raj, and A. Qazi, “A Review on Arabic Sentiment Analysis: State-of-the-Art, Taxonomy and Open Research Challenges,” IEEE Access, vol. 7, pp. 162008–162024, 2019, doi: 10.1109/ACCESS.2019.2951530.
O. Oueslati, E. Cambria, M. Ben HajHmida, and H. Ounelli, “A review of sentiment analysis research in Arabic language,” Future Generation Computer Systems, vol. 112, pp. 408–430, 2020.
M. Alhanjouri, “Pre Processing Techniques for Arabic Documents Clustering,” International Journal of Engineering and Management Research, no. 2, pp. 70–79, 2017.
B. Jurish and K.-M. Würzner, “Word and Sentence Tokenization with Hidden Markov Models,” Journal for Language Technology and Computational Linguistics, vol. 28, no. 2, pp. 61–83, 2013, doi: 10.21248/jlcl.28.2013.176.
I. A. El-Khair, “Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study,” pp. 1–15, 2017.
A. Alajmi, E. M. Saad, and R. R. Darwish, “Toward an ARABIC Stop-Words List Generation,” Int J Comput Appl, vol. 46, no. 8, pp. 8–13, 2012.
T. Kanan, O. Sadaqa, A. Almhirat, and E. Kanan, “Arabic light stemming: A comparative study between p-stemmer, khoja stemmer, and light10 stemmer,” in 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), IEEE, 2019, pp. 511–515.
K. L. Tan, C. P. Lee, K. M. Lim, and K. S. M. Anbananthen, “Sentiment Analysis With Ensemble Hybrid Deep Learning Model,” IEEE Access, vol. 10, no. July, pp. 103694–103704, 2022, doi: 10.1109/ACCESS.2022.3210182.
M. El-Masri, N. Altrabsheh, and H. Mansour, “Successes and challenges of Arabic sentiment analysis research: a literature review,” Social Network Analysis and Mining, vol. 7, no. 1. 2017. doi: 10.1007/s13278-017-0474-x.
A. M. Alayba, V. Palade, M. England, and R. Iqbal, “Improving Sentiment Analysis in Arabic Using Word Representation,” in 2nd IEEE International Workshop on Arabic and Derived Script Analysis and Recognition, ASAR 2018, 2018. doi: 10.1109/ASAR.2018.8480191.
M. N. Al-Kabi, S. A. Kazakzeh, B. M. Abu Ata, S. A. Al-Rababah, and I. M. Alsmadi, “A novel root based Arabic stemmer,” Journal of King Saud University - Computer and Information Sciences, vol. 27, no. 2, 2015, doi: 10.1016/j.jksuci.2014.04.001.
T. Kanan, O. Sadaqa, A. Almhirat, and E. Kanan, “Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer,” in 2019 6th International Conference on Social Networks Analysis, Management and Security, SNAMS 2019, 2019. doi: 10.1109/SNAMS.2019.8931842.
F. E. Zamani, K. Umam, W. D. I. Azis, and W. S. Abdillah, “Analysis and implementation of computer-based system development of stemming algorithm for finding Arabic root word,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Dec. 2019. doi: 10.1088/1742-6596/1402/6/066030.
T. Zerrouki, “Tashaphyne, Arabic light stemmer,” Tashaphyne/0.2, 2010.
A. Oussous, A. A. Lahcen, and S. Belfkih, “Impact of text preprocessing and ensemble learning on Arabic sentiment analysis,” in ACM International Conference Proceeding Series, 2019. doi: 10.1145/3320326.3320399.
Y. A. Alhaj, J. Xiang, D. Zhao, M. A. A. Al-Qaness, M. Abd Elaziz, and A. Dahou, “A Study of the Effects of Stemming Strategies on Arabic Document Classification,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2903331.
M. O. Alhawarat, H. Abdeljaber, and A. Hilal, “Effect of stemming on text similarity for Arabic language at sentence level,” PeerJ Comput Sci, vol. 7, p. e530, May 2021, doi: 10.7717/peerj-cs.530.
S. Boukil, M. Biniz, F. El Adnani, L. Cherrat, and A. E. El Moutaouakkil, “Arabic text classification using deep learning technics,” International Journal of Grid and Distributed Computing, vol. 11, no. 9, pp. 103–114, 2018, doi: 10.14257/ijgdc.2018.11.9.09.
T. Kanan and E. A. Fox, “Automated arabic text classification with P‐S temmer, machine learning, and a tailored news article taxonomy,” J Assoc Inf Sci Technol, vol. 67, no. 11, pp. 2667–2683, 2016.
C. Zong, R. Xia, and J. Zhang, “Text Classification,” in Text Data Mining, Springer, 2021, pp. 93–124.
X. Deng, Y. Li, J. Weng, and J. Zhang, “Feature selection for text classification: A review,” Multimed Tools Appl, vol. 78, pp. 3797–3816, 2019.
V. N. Vapnik, “An overview of statistical learning theory,” IEEE Transactions on Neural Networks, vol. 10, no. 5. 1999. doi: 10.1109/72.788640.
T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1998. doi: 10.1007/s13928716.
Ahmed Burhan Mohammed, “Decision Tree, Naïve Bayes and Support Vector Machine Applying on Social Media Usage in NYC / Comparative Analysis,” Tikrit Journal of Pure Science, vol. 22, no. 9, pp. 94–99, 2023, doi: 10.25130/tjps.v22i9.881.
J. Ababneh, “Application of Naïve Bayes, Decision Tree, and K-Nearest Neighbors for Automated Text Classification,” Mod Appl Sci, vol. 13, no. 11, p. 31, 2019, doi: 10.5539/mas.v13n11p31.
A. Yousaf et al., “Emotion recognition by textual tweets classification using voting classifier (LR-SGD),” IEEE Access, vol. 9, pp. 6286–6295, 2020.
I. Prayoga and M. Dwifebri, “JURNAL MEDIA INFORMATIKA BUDIDARMA Sentiment Analysis on Indonesian Movie Review Using KNN Method With the Implementation of Chi-Square Feature Selection,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 1, 2023.
F. M. J. M. Shamrat et al., “Sentiment analysis on twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 23, no. 1, 2021, doi: 10.11591/ijeecs.v23.i1.pp463-470.
Z. Chen, L. J. Zhou, X. Da Li, J. N. Zhang, and W. J. Huo, “The Lao text classification method based on KNN,” in Procedia Computer Science, 2020. doi: 10.1016/j.procs.2020.02.053.
J. Han, J. Pei, and H. Tong, Data mining: concepts and techniques. Morgan kaufmann, 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Hawraa Fadhil Khelil, Mohammed Fadhil Ibrahim, Hafsa Ataallah Hussein, Raed Kamil Naser
This work is licensed under a Creative Commons Attribution 4.0 International License.