Binary Classification of Customer’s Online Purchasing Behavior Using Machine Learning
DOI:
https://doi.org/10.51173/jt.v5i2.1226Keywords:
Machine Learning, Customer Behaviour, Classification Algorithms, Banking Industry, ClassifiersAbstract
The UK financial sector increasingly employs machine learning techniques to enhance revenue and understand customer behaviour. In this study, we develop a machine learning workflow for high classification accuracy and improved prediction confidence using a binary classification approach on a publicly available dataset from a Portuguese financial institution as a proof of concept. Our methodology includes data analysis, transformation, training, and testing machine learning classifiers such as Naïve Bayes, Decision Trees, Random Forests, Support Vector Machines, Logistic Regression, Artificial Neural Networks, AdaBoost, and Gradient Descent. We use stratified k-folding (k=5) cross-validation and assemble the top-performing classifiers into a decision-making committee, resulting in over 92% accuracy with two-thirds voting confidence. The workflow is simple, adaptable, and suitable for UK banks, demonstrating the potential for practical implementation and data privacy. Future work will extend our approach to UK banks, reformulate the problem as a multi-class classification, and introduce pre-training automated steps for data analysis and transformation.
Downloads
References
Statista-eCommerce-UK, in https://www.statista.com/outlook/dmo/ecommerce/united-kingdom?currency=gbp. 2021.
Ecommerce News. 2019; Available from: https://ecommercenews.eu/ecommerce-in-uk-to-reach-e200-billion-in-2019/.
Machine learning in UK financial services. Bank of England 2019; Available from: https://www.bankofengland.co.uk/report/2019/machine-learning-in-ukfinancial-services.
A. M. Choudhury and K. Nur, "A Machine Learning Approach to Identify Potential Customer Based on Purchase Behavior," 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 2019, pp. 242-247, doi: 10.1109/ICREST.2019.8644458.
V. Shrirame, J. Sabade, H. Soneta and M. Vijayalakshmi, "Consumer Behavior Analytics using Machine Learning Algorithms," 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 2020, pp. 1-6, doi: 10.1109/CONECCT50063.2020.9198562.
Sundharam, V., M.S. Sriramm, and P. Pachhaiammal. Predicting the Customer Behavior Through Web Page and Content Mining Techniques. in 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT). 2018. In IEEE,https://doi.org/10.1109/IC3IoT.2018.8668176.
Asniar and K. Surendro, "Predictive Analytics for Predicting Customer Behavior," 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), Yogyakarta, Indonesia, 2019, pp. 230-233, doi: 10.1109/ICAIIT.2019.8834571.
Y. Zuo and K. Yada, "Using statistical learning theory for purchase behavior prediction via direct observation of in-store behavior," 2015 2nd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE), Nadi, Fiji, 2015, pp. 1-6, doi: 10.1109/APWCCSE.2015.7476215.
Y. Yamamoto et al., "Towards Self-Organizing Internet of Things - Aware Systems for Online Sales," 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Bangkok, Thailand, 2015, pp. 208-215, doi: 10.1109/SITIS.2015.85.
X. Deng, "Big data technology and ethics considerations in customer behavior and customer feedback mining," 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 2017, pp. 3924-3927, doi: 10.1109/BigData.2017.8258399.
Ibrahim, M. M. A., Syed-Mohamad, S. M. and Husin, M. H. (2019) 'Managing Quality Assurance Challenges of DevOps through Analytics.' In Proceedings of the 2019 8th International Conference on Software and Computer Applications. Penang, Malaysia, Association for Computing Machinery, pp. 194– 198. https://doi.org/10.1145/3316615.3316670.
Gibert, K., M. Sànchez–Marrè, and J. Izquierdo, (2016). A survey on pre-processing techniques: Relevant issues in the context of environmental data mining. AI Communications. 29. 1-37. 10.3233/AIC-160710.
Z. Guan, T. Ji, X. Qian, Y. Ma and X. Hong, "A Survey on Big Data Pre-processing," 2017 5th Intl Conf on Applied Computing and Information Technology/4th Intl Conf on Computational Science/Intelligence and Applied Informatics/2nd Intl Conf on Big Data, Cloud Computing, Data Science (ACIT-CSII-BCD), Hamamatsu, Japan, 2017, pp. 241-247, doi: 10.1109/ACIT-CSII-BCD.2017.49.
Moura, A. F. D., Pinho, C. M. D. A., Napolitano, D. M. R., Martins, F. S. and Fornari Junior, J. C. F. D. B. (2020) 'Optimization of operational costs of Call centers employing classification techniques.' Research, Society and Development, 9(11) p. e86691110491.
B. Valarmathi, T. Chellatamilan, H. Mittal, J. Jagrit and S. Shubham, "Classification of Imbalanced Banking Dataset using Dimensionality Reduction," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019, pp. 1353-1357, doi: 10.1109/ICCS45141.2019.9065648.
E. Çetiner, T. Koçak and V. Ç. Güngör, "Credit risk analysis based on hybrid classification: Case studies on German and Turkish credit datasets," 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2018, pp. 1-4, doi: 10.1109/SIU.2018.8404405.
M. Fauvel, J. Chanussot and J. A. Benediktsson, "Kernel Principal Component Analysis for Feature Reduction in Hyperspectrale Images Analysis," Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006, Reykjavik, Iceland, 2006, pp. 238-241, doi: 10.1109/NORSIG.2006.275232.
Romi S. Wahono, N. Suryana, Sabrina Ahmad, A Comparison Framework of Classification Models for Software Defect Prediction, October 2014, Journal of Computational and Theoretical Nanoscience 20 (10-12):1945-1950, DOI: 10.1166/asl.2014.5640.
O. Adepoju, J. Wosowei, S. lawte and H. Jaiman, "Comparative Evaluation of Credit Card Fraud Detection Using Machine Learning Techniques," 2019 Global Conference for Advancement in Technology (GCAT), Bangalore, India, 2019, pp. 1-6, doi: 10.1109/GCAT47503.2019.8978372.
P. Malik, S. Sengupta and J. S. Jadon, "Comparative Analysis of Soil Properties to Predict Fertility and Crop Yield using Machine Learning Algorithms," 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2021, pp. 1004-1007, doi: 10.1109/Confluence51648.2021.9377147.
Asare-Frempong, J. and Jayabalan, M. (2017a) J. Asare-Frempong and M. Jayabalan, "Predicting customer response to bank direct telemarketing campaign," 2017 International Conference on Engineering Technology and Technopreneurship (ICE2T), Kuala Lumpur, Malaysia, 2017, pp. 1-4, doi: 10.1109/ICE2T.2017.8215961.
C. S. T. Koumétio, W. Cherif and S. Hassan, "Optimizing the prediction of telemarketing target calls by a classification technique," 2018 6th International Conference on Wireless Networks and Mobile Communications (WINCOM), Marrakesh, Morocco, 2018, pp. 1-6, doi: 10.1109/WINCOM.2018.8629675.
A. Alzahrani and D. B. Rawat, "Comparative Study of Machine Learning Algorithms for SMS Spam Detection," 2019 SoutheastCon, Huntsville, AL, USA, 2019, pp. 1-6, doi: 10.1109/SoutheastCon42311.2019.9020530.
Kun-Huang Chen and Hsuan-Wen Chiu. 2020. Applying AI Techniques to Predict the Success of Bank Telemarketing. In Proceedings of the 2020 4th International Conference on Deep Learning Technologies (ICDLT '20). Association for Computing Machinery, New York, NY, USA, 89–93. https://doi.org/10.1145/3417188.3417198.
Sakar, C.O., et al., 2018. Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks. https://doi.org/10.1007/s00521-018-3523-0
P. Ruangthong and S. Jaiyen, "Hybrid ensembles of decision trees and Bayesian network for class imbalance problem," 2016 8th International Conference on Knowledge and Smart Technology (KST), Chiang Mai, Thailand, 2016, pp. 39-42, doi: 10.1109/KST.2016.7440523.
Lei Su, Hongzhi Liao, Zhengtao Yu and Quan Zhao, "Ensemble learning for question classification," 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, 2009, pp. 501-505, doi: 10.1109/ICICISYS.2009.5358124.
A. Rojarath, W. Songpan and C. Pong-inwong, "Improved ensemble learning for classification techniques based on majority voting," 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 2016, pp. 107-110, doi: 10.1109/ICSESS.2016.7883026.
A. Safiya Parvin and B. Saleena, "An Ensemble Classifier Model to Predict Credit Scoring - Comparative Analysis," 2020 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Chennai, India, 2020, pp. 27-30, doi: 10.1109/iSES50453.2020.00017.
P. Ravikumar and V. Ravi, "Bankruptcy Prediction in Banks by an Ensemble Classifier," 2006 IEEE International Conference on Industrial Technology, Mumbai, India, 2006, pp. 2032-2036, doi: 10.1109/ICIT.2006.372529.
Moro, Sérgio & Cortez, Paulo & Rita, Paulo. (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems. 62. 10.1016/j.dss.2014.03.001.
Linthicum, K.P., Schafer KM, Ribeiro JD. Machine learning in suicide science: Applications and ethics. Behav Sci Law. 2019 May;37(3):214-222. doi: 10.1002/bsl.2392. Epub 2019 Jan 4. PMID: 30609102.
Shashidhara, B.M., et al. (2015). Evaluation of Machine Learning Frameworks on Bank Marketing and Higgs Datasets. 551-555. 10.1109/ICACCE.2015.31. Second International Conference on Advances in Computing and Communication Engineering (ICACCE). IEEE.
T. Yang, K. Qian, D. C. -T. Lo, Y. Xie, Y. Shi and L. Tao, "Improve the Prediction Accuracy of Naïve Bayes Classifier with Association Rule Mining," 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), New York, NY, USA, 2016, pp. 129-133, doi: 10.1109/BigDataSecurity-HPSC-IDS.2016.38.
Shah, S. Gala and N. Patil, "ModBoost for unbiased classification," 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC), Delhi, India, 2014, pp. 1-5, doi: 10.1109/ICDMIC.2014.6954252.
H. Benjamin Fredrick David and S. Antony Belcy,” Heart Disease Prediction Using Data Mining Techniques”, October 2018, Ictact Journal On Soft Computing, October 2018, Volume: 09, IssuE: 01, ISSN: 2229-6956 (online), DOI: 10.21917/ijsc.2018.0253.
F. El-Matouat, O. Colot, P. Vannoorenberghe and J. Labiche, "Using optimal variables for Bayesian network classifiers," Proceedings of the Third International Conference on Information Fusion, Paris, France, 2000, pp. MOD1/18-MOD1/23 vol.1, doi: 10.1109/IFIC.2000.862518.
Ḱegl, B. a. (2013). The return of AdaBoost.MH: multi-class Hamming trees. CoRR, abs/1312.6086. https://www.semanticscholar.org/paper/The-return-of-AdaBoost.MH%3A-multi-class-Hamming-K%C3%A9gl/a37c1df39575fd59d8b3b4697da2de486c71ab3.
Kégl, B.a., The return of AdaBoost.MH: multi-class Hamming trees. arXiv pre-print server, 2013.
Dreyfus, Stuart E.. “Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure.” Journal of Guidance Control and Dynamics 13 (1990): 926-928, DOI: 10.2514/3.25422.
Quinlan, J.R. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106.
http://dx.doi.org/10.1007/BF00116251.
Tin Kam Ho, "Random decision forests," Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 1995, pp. 278-282 vol.1, doi: 10.1109/ICDAR.1995.598994.
M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt and B. Scholkopf, "Support vector machines," in IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18-28, July-Aug. 1998, doi: 10.1109/5254.708428
Mishra, A., (2018), Metrics to Evaluate your Machine Learning Algorithm. towards data science: [Online] [Accessed https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm- f10ba6e38234.
Brownlee, J. (2020b), “How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification.”, Machine Learning Mastery: [Online], [Accessed https://machinelearningmastery.com/precision-recall- and-f-measure-for-imbalanced-classification.
Adi Bronshtein, ‘A Quick Introduction to the “Pandas” Python Library’., (2017), Towards Data Science: [Online] [Accessed https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library- f1b678f34673.
GeeksforGeeks, 2022, NumPy in Python. geeks for geeks. [Online] [Accessed https://www.geeksforgeeks.org/numpy-in-python-set-1-introduction.
Brownlee J. , (2020a), “A Gentle Introduction to Scikit-Learn: A Python Machine Learning Library.”, Machine Learning Mastery. [Online], [Accessed https://machinelearningmastery.com/a-gentle-introduction-to- scikit-learn-a-python-machine-learning-library.
Lee, Yong Jae Shin, "Machine learning for enterprises: Applications, algorithm selection, and challenges", Business Horizons, Volume 63, Issue 2, 2020, Pages 157-170, ISSN 0007-6813, https://doi.org/10.1016/j.bushor.2019.10.005.
Saunders, M. et al., (2000), Research Methods for Business Students: Lecturers’ Guide., Harlow: FT Prentice Hall. Accessed: https://openresearch.surrey.ac.uk/esploro/outputs/99513354902346.
Simbeck, Katharina. (2019). HR Analytics and Ethics. IBM Journal of Research and Development. PP. 1-1. 10.1147/JRD.2019.2915067.
Sag, Matthew. (2019). The New Legal Landscape for Text Mining and Machine Learning. SSRN Electronic Journal. 10.2139/ssrn.3331606.
Krishni. (2018), “K-Fold Cross Validation”, Data Driven Investor: [Online] [Accessed https://medium.datadriveninvestor.com/k-fold-cross-validation-6b8518070833.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Ahmad Aldelemy , Raed A. Abd-Alhameed
This work is licensed under a Creative Commons Attribution 4.0 International License.