Evaluating the Performance of a Fake News Model on A Domain-Specific and Heterogeneous Dataset to Improve Detection

Georgina Obuandike; Emmy Danny Ajik; Faith Oluwatosin Echobu

doi:10.51173/jt.v7i2.2640

Authors

Georgina N. Obuandike Faculty of Computing, Federal University Dutsin-Ma, Katsina State, Nigeria
Emmy Danny Ajik Faculty of Computing, Federal University Dutsin-Ma, Katsina State, Nigeria
Faith Oluwatosin Echobu Faculty of Computing, Federal University Dutsin-Ma, Katsina State, Nigeria

DOI:

https://doi.org/10.51173/jt.v7i2.2640

Keywords:

Domain, Dataset, Detection, Fake-News

Abstract

The rapid evolution of technology and the digital age has led to an increase in the spread of fake news, severely undermining the accuracy of information. This study aims to improve fake news detection methods in distinct domains through in-depth dataset analysis using a Convolutional Neural Network (CNN), the research-trained models using an optimized CNN model on publicly available datasets. The findings show that machine learning models trained on domain-specific datasets can accurately identify the nuances of fake news unique to those domains. Compared to models trained on broader datasets, the results demonstrate that models trained on domain-specific data achieved higher accuracy, precision, recall, and F1-score, increasing from 68% to 99% across all metrics when compared with a baseline CNN model. However, while domain-specific models perform exceptionally well in their respective contexts, models trained on a diverse range of datasets exhibit greater generalizability across domains. These findings suggest that dynamic and robust fake news detection systems should integrate both heterogeneous datasets and domain-specific features to enhance effectiveness.

Downloads

Download data is not yet available.

Author Biographies

Georgina N. Obuandike, Faculty of Computing, Federal University Dutsin-Ma, Katsina State, Nigeria

Department of Computer Science

Emmy Danny Ajik, Faculty of Computing, Federal University Dutsin-Ma, Katsina State, Nigeria

Faith Oluwatosin Echobu, Faculty of Computing, Federal University Dutsin-Ma, Katsina State, Nigeria

Department of Computer Science

References

Tang, Yixuan, and Yi Yang. "Do We Need Domain-Specific Embedding Models? An Empirical Investigation." arXiv preprint arXiv:2409.18511 (2024), https://doi.org/10.48550/arXiv.2409.18511.

E. D. Ajik, G. N. Obunadike, and F. O. Echobu, ‘Fake News Detection Using Optimized CNN and LSTM Techniques’, Journal of Information Systems and Informatics, vol. 5, no. 3, pp. 1044–1057, 2023, doi: 10.51519/journalisi.v5i3.548.

Choudhury, D., Acharjee, T. A novel approach to fake news detection in social networks using genetic algorithm applying machine learning classifiers. Multimed Tools Appl 82, 9029–9045 (2023). https://doi.org/10.1007/s11042-022-12788-1.

Nadeem, Muhammad Imran, Syed Agha Hassnain Mohsan, Kanwal Ahmed, Dun Li, Zhiyun Zheng, Muhammad Shafiq, Faten Khalid Karim, and Samih M. Mostafa. 2023. "HyproBert: A Fake News Detection Model Based on Deep Hypercontext" Symmetry 15, no. 2: 296. https://doi.org/10.3390/sym15020296.

Krishna, S. Rama, S. V. Vasantha, and K. Mani Deep. "Survey on fake news detection using machine learning algorithms." International Journal of Engineering Research & Technology. ISSN (2021): 2278-0181.

Praseed, Amit, Jelwin Rodrigues, and P. Santhi Thilagam. "Hindi fake news detection using transformer ensembles." Engineering Applications of Artificial Intelligence 119 (2023): 105731, https://doi.org/10.1016/j.engappai.2022.105731.

Suryavardan, S., Shreyash Mishra, Megha Chakraborty, Parth Patwa, Anku Rani, Aman Chadha, Aishwarya Reganti et al. "Findings of factify 2: multimodal fake news detection." arXiv preprint arXiv:2307.10475 (2023), https://doi.org/10.48550/arXiv.2307.10475.

Albraikan, Amani Abdulrahman, Mohammed Maray, Faiz Abdullah Alotaibi, Mrim M. Alnfiai, Arun Kumar, and Ahmed Sayed. 2023. "Bio-Inspired Artificial Intelligence with Natural Language Processing Based on Deceptive Content Detection in Social Networking" Biomimetics 8, no. 6: 449. https://doi.org/10.3390/biomimetics8060449.

S. M. Dwivedi and S. B. Wankhade, ‘Comparing the Effectiveness of Different Machine Learning and Deep Learning Models for Fake News Detection’, Journal of Informatics Education and Research, vol. 3, no. 2, 2023, doi: 10.52783/jier.v3i2.207.

Kabir, ASM Humaun, Alexander Alexandrovich Kharlamov, and Ilia Mikhailovich Voronkov. "Research methods for fake news detection in bangla text." In International Conference on Neuroinformatics, pp. 54-60. Cham: Springer Nature Switzerland, 2023, https://doi.org/10.1007/978-3-031-44865-2_6.

Shariq, Abdul, Kayla Wright, Lauren Taylor, and Anna Marbut. "Detecting Fake News Using Natural Language Processing." Capstone Chronicles: 26.

B. Saha, ‘Indian Fake News’. 2022. [Online]. Available: https://www.kaggle.com/datasets/imbikramsaha/fake-real-news/.

E. Aghammadzada, ‘COVID19 Fake News Dataset NLP’. 2020. [Online]. Available: https://www.kaggle.com/datasets/elvinagammed/covid19-fake-news-dataset-nlp.

M. Koreň, ‘Dezinfo SK - Fake News Dataset’. 2023. [Online]. Available: https://www.kaggle.com/datasets/matejkore/dezinfo-sk-fake-news-dataset.

Zepopo, ‘Ukrainian news’. 2023. [Online]. Available: https://www.kaggle.com/datasets/zepopo/ukrainian-fake-and-true-news.

G. Hacheme, ‘French Fake News Detector’. 2020. [Online]. Available: https://www.kaggle.com/datasets/hgilles06/frenchfakenewsdetector.

S. D. Das, A. Basak, and S. Dutta, ‘A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection’. 2021.

J. C. B. Cruz and C. Cheng, ‘Fake News Detection via Multitask Transfer Learning’, in LREC, 2020.

Zubiaga, Arkaitz, Geraldine Wong Sak Hoi, Maria Liakata, and Rob Procter. "PHEME dataset of rumours and non-rumours." (2016). doi: 10.6084/m9.figshare.4010619.v1.

N. Michelle, ‘PHEME Dataset for Rumour Detection’. 2022. [Online]. Available: https://www.kaggle.com/datasets/nicolemichelle/pheme-dataset-for-rumour-detection/.

Ott, Myle, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. "Finding deceptive opinion spam by any stretch of the imagination." arXiv preprint arXiv:1107.4557 (2011), https://doi.org/10.48550/arXiv.1107.4557.

Ott, Myle, Claire Cardie, and Jeffrey T. Hancock. "Negative deceptive opinion spam." In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: human language technologies, pp. 497-501. 2013.

Aragón, Mario Ezra, Horacio Jesús Jarquín-Vásquez, Manuel Montes-y-Gómez, Hugo Jair Escalante, Luis Villaseñor Pineda, Helena Gómez-Adorno, Juan Pablo Posadas-Durán, and Gemma Bel-Enguix. "Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish." In IberLEF@ SEPLN, pp. 222-235. 2020.

Gómez-Adorno, Helena, Juan Pablo Posadas-Durán, Gemma Bel Enguix, and Claudia Porto Capetillo. "Overview of fakedes at Iberlef 2021: Fake news detection in Spanish shared task." Procesamiento del lenguaje natural 67 (2021): 223-231.

J. P. Posadas-Durán, H. Gómez-Adorno, G. Sidorov, and J. J. M. Escobar, ‘Detection of fake news in a new corpus for the Spanish language’, Journal of Intelligent & Fuzzy Systems, vol. 36, no. 5, pp. 4869-4876, 2019.

Y. Yang, May 30, 2017. [Online]. Available: https://drive.google.com/open?id=0B3e3qZpPtccsMFo5bk9Ib3VCc2c.