IQAD: Iraqi Arabic Dialect Dataset for Multi-Regional Dialect Classification Using Conventional and Machine Learning Approaches

Authors

  • Noora Aljubouri Department of Software, Computer Engineering College, Iran University of Science and Technology (IUST), Tehran, Islamic Republic of Iran
  • Naderi Hassan Department of Software, Computer Engineering College, Iran University of Science and Technology (IUST), Tehran, Islamic Republic of Iran

DOI:

https://doi.org/10.51173/jt.v7i3.2695

Keywords:

Machine Learning, Dialect, Language, Classification, SVM

Abstract

The work's main contribution is creating a dataset for specifying Iraqi Arabic dialects from written texts. With the increase of Iraqi dialectal Arabic usage across social media platforms, accurate dialect identification has become an important step for such tasks as sentiment analysis, social media monitoring, and linguistic studies. We collected, annotated, and prepared normal text data: 53,146 unique text samples taken from social media, divided into three major dialects in Iraq: Middle, Western, and Southern. The lexical variability of the corpus is 78,582 unique tokens. The dataset was passed through preprocessing to clean and prepare it for classification-based tasks. To verify the quality of this dataset, we carried out experiments with two approaches for the classification: a dictionary-based methodology and a TF-IDF-based SVM classification. The SVM outperformed the dictionary-based classifier by achieving 74% accuracy and F1-score, whereas the classifier peaked at 63.6% accuracy and 63.4% F1 score. The results show the effectiveness of the dataset in supporting dialect classification tasks and its potential for use in future Iraqi Arabic NLP applications and research.

Downloads

Download data is not yet available.

Author Biographies

Noora Aljubouri, Department of Software, Computer Engineering College, Iran University of Science and Technology (IUST), Tehran, Islamic Republic of Iran

     

Naderi Hassan, Department of Software, Computer Engineering College, Iran University of Science and Technology (IUST), Tehran, Islamic Republic of Iran

     

References

A Radford, Alec, et al. "Improving language understanding by generative pre-training." (2018).

C Zhou, C Sun, et al. "A C-LSTM neural network for text classification," arXiv, Nov 2015, https://doi.org/10.48550/arXiv.1511.08630.

A. Alnawas and N. Arici, "Sentiment Analysis of Iraqi Arabic Dialect on Facebook Based on Distributed Representations of Documents," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 18, no. 3, Article 20, pp. 1–17, Sep. 2019, https://doi.org/10.1145/3278605.

N. Tibi and M. A. Messaoud, "Arabic dialect classification using an adaptive deep learning model," Bull. Electr. Eng. Inform., vol. 14, no. 2, pp. 1108–1116, Apr. 2025, https://doi.org/10.11591/eei.v14i2.8165.

Y. Matrane, F. Benabbou, and N. Sael, "A systematic literature review of Arabic dialect sentiment analysis," J. King Saud Univ. - Comput. Inf. Sci., vol. 35, no. 6, p. 101570, June 2023, https://doi.org/10.1016/j.jksuci.2023.101570.

E Alsarsour, R Mohamed, and T. Elsayed, "DART: A Large Dataset of Dialectal Arabic Tweets," in Proc. 11th Int. Conf. Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.

A. Keleg and W. Magdy, "Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification," arXiv preprint, Oct. 2023, https://doi.org/10.48550/arXiv.2310.13661.

I. Alansari, "Artificial Intelligence Model to Detect and Classify Arabic Dialects," J. Softw. Eng. Appl., vol. 16, pp. 287–300, Jul. 2023, https://doi.org/10.4236/jsea.2023.167015.

A. Aliwy, H. Taher, and Z. AboAltaheen. (2020, Dec.). "Arabic Dialects Identification for All Arabic countries," Proc. Fifth Arabic Natural Language Processing Workshop [Online]. pp. 302–307. Available: https://aclanthology.org/2020.wanlp-1.32/.

A. A. Hnaif, E. Kanan, and T. Kanan, "Sentiment Analysis for Arabic Social Media News Polarity," Intell. Autom. Soft Comput., vol. 28, no. 1, pp. 107–119, Feb. 2021, https://doi.org/10.32604/iasc.2021.015939.

T. Kanan et al., "A Review of Natural Language Processing and Machine Learning Tools Used to Analyze Arabic Social Media," 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, 2019, pp. 622-628, https://doi.org/10.1109/JEEIT.2019.8717369.

A. Alnawas and N. Arici, "Sentiment Analysis of Iraqi Arabic Dialect on Facebook Based on Distributed Representations of Documents," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 18, no. 3, Article 20, pp. 1–17, Sep. 2019, https://doi.org/10.1145/3278605.

U. Braga-Neto, Fundamentals of Pattern Recognition and Machine Learning. Cham, Switzerland: Springer, 2020.

P. Dangeti. Statistics for machine learning. UK: Packt Publishing Ltd, 2017.

Jo, T. "Machine learning foundations: Supervised, Unsupervised, and Advanced Learning. Cham: Springer International Publishing." 2021.

D. A. Pisner and D. M. Schnyer, "Support vector machine," in Machine Learning, A. Mechelli and S. Vieira, Eds. Academic Press, 2020, pp. 101–121.

J. Lilleberg, Y. Zhu, and Y. Zhang, "Support vector machines and Word2vec for text classification with semantic features," in 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCICC)*, Beijing, China, 2015, pp. 136–140.

D. Jurafsky and J. H. Martin, "Vector Semantics and Embeddings" in Speech and Language Processing, draft, Jan. 12, 2025.

D. E. Cahyani and I. Patasik, "Performance comparison of tf-idf and word2vec models for emotion text classification," Bull. Electr. Eng. Inform., vol. 10, no. 5, pp. 2780–2788, Sep. 2021, https://doi.org/10.11591/eei.v10i5.3157.

Scikit Developers. (2025, January 1). Scikit-learn: Machine Learning in Python [Online]. Available: https://scikit-learn.org.

The supervised learning process

Downloads

Published

2025-09-30

How to Cite

Noora Aljubouri, & Naderi Hassan. (2025). IQAD: Iraqi Arabic Dialect Dataset for Multi-Regional Dialect Classification Using Conventional and Machine Learning Approaches. Journal of Techniques, 7(3), 53–63. https://doi.org/10.51173/jt.v7i3.2695

Issue

Section

Engineering (Miscellaneous): Computer Engineering

Similar Articles

1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.