Comparison of Feature Selection and Feature Extraction Role in Dimensionality Reduction of Big Data

Authors

  • Haidar Khalid Malik Technical College of Management - Baghdad, Middle Technical University, Baghdad, Iraq
  • Nashaat Jasim Al-Anber Technical College of Management - Baghdad, Middle Technical University, Baghdad, Iraq.
  • Fuad AbdoEsmail Al- Mekhlafi Sana'a University, Sana'a, Yemen

DOI:

https://doi.org/10.51173/jt.v5i1.1027

Keywords:

Feature Extraction, Feature Selection, Principal Component Analysis (PCA), Dimensionality Reduction

Abstract

Recently, researchers intensified their efforts on a dataset with a large number of features named Big Data because of the technological revolution and the development in the data science sector. Dimensionality reduction technology has efficient, effective, and influential methods for analyzing this data, which contains many variables. The importance of Dimensionality Reduction technology lies in several fields, including “data processing, patterns recognition, machine learning, and data mining”. This paper compares two essential methods of dimensionality reduction, Feature Extraction and Feature Selection Which Machine Learning models frequently employ. We applied many classifiers like (Support vector machines, k-nearest neighbors, Decision tree, and Naive Bayes ) to the data of the anthropometric survey of US Army personnel (ANSUR 2) to classify the data and test the relevance of features by predicting a specific feature in USA Army personnel results showing that (k-nearest neighbors) achieved high accuracy (83%) in prediction, then reducing the dimensions by several techniques like (Highly Correlated Filter, Recursive  Feature Elimination, and principal components Analysis) results showing that (Recursive  Feature Elimination) have the best accuracy by (66%), From these results, it is clear that the efficiency of dimension reduction techniques varies according to the nature of the data. Some techniques are more efficient than others in text data and others are more efficient in dealing with images.

Downloads

Download data is not yet available.

Author Biography

Fuad AbdoEsmail Al- Mekhlafi, Sana'a University, Sana'a, Yemen

Faculty of Commerce and Economics

References

M. Al-Ayyoub, Y. Jararweh, A. Rabab’ah, and M. Aldwairi, “Feature extraction and selection for Arabic tweets authorship authentication,” J. Ambient Intell. Humaniz. Comput., vol. 8, no. 3, pp. 383–393, 2017, doi: 10.1007/s12652-017-0452-1.

C. A. Buckner et al., "We are IntechOpen, the world ' s leading publisher of Open Access books Built by scientists, for scientists TOP 1 %," Intech, vol. 11, no. tourism, p. 13, 2016, [Online]. Available: https://www.intechopen.com/books/advanced-biometric-technologies/liveness-detection-in-biometrics

M. Ziaye, S. Khalid, and Y. Mehmood, “Survey of Feature Selection/Extraction Methods used in Biomedical Imaging,” Int. J. Comput. Sci. Inf. Secur., vol. 16, no. 5, pp. 169–177, 2018.

“ANSUR II | The OPEN Design Lab.” https://www.openlab.psu.edu/ansur2/ (accessed Sep. 26, 2022).

Z. Chen et al., “IFeature: A Python package and web server for features extraction and selection from protein and peptide sequences,” Bioinformatics, vol. 34, no. 14, pp. 2499–2502, 2018, doi: 10.1093/bioinformatics/bty140.

H. Motoda and H. Liu, "Feature selection, extraction, and construction," Commun. IICM, vol. 5, pp. 67–72, 2002.

J. Li et al., “Feature selection: A data perspective,” ACM Comput. Surv., vol. 50, no. 6, 2017, doi: 10.1145/3136625.

S. Khalid, T. Khalil, and S. Nasreen, “A survey of feature selection and feature extraction techniques in machine learning,” Proc. 2014 Sci. Inf. Conf. SAI 2014, pp. 372–378, 2014, doi: 10.1109/SAI.2014.6918213.

I. Journal and I. Factor, “A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data,” Comput. Math. Methods Med., vol. 2015, no. 1, pp. 2–4, 2015, [Online]. Available: http://dx.doi.org/10.1155/2015/

R. Kavitha and E. Kannan, “An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining,” 1st Int. Conf. Emerg. Trends Eng. Technol. Sci. ICETETS 2016 - Proc., 2016, doi: 10.1109/ICETETS.2016.7603000.

P. M. M. Manohara, G. Attigeri, and R. M. Pai, “Analysis of feature selection and extraction algorithm for loan data: A big data approach,” 2017 Int. Conf. Adv. Comput. Commun. Informatics, ICACCI 2017, vol. 2017-Janua, pp. 2147–2151, 2017, doi: 10.1109/ICACCI.2017.8126163.

A. A. Raweh, M. Nassef, and A. Badr, “A Hybridized Feature Selection and Extraction Approach for Enhancing Cancer Prediction Based on DNA Methylation,” IEEE Access, vol. 6, pp. 15212–15223, 2018, doi: 10.1109/ACCESS.2018.2812734.

D. Zhang, L. Zou, X. Zhou, and F. He, “Integrating Feature Selection and Feature Extraction Methods with Deep Learning to Predict Clinical Outcome of Breast Cancer,” IEEE Access, vol. 6, pp. 28936–28944, 2018, doi: 10.1109/ACCESS.2018.2837654.

I. Perova and Y. Bodyanskiy, “Adaptive human machine interaction approach for feature selection-extraction task in medical data mining,” Int. J. Comput., vol. 17, no. 2, pp. 113–119, 2018, doi: 10.47839/ijc.17.2.997.

A. Phinyomark, R. N. Khushaba, and E. Scheme, “Feature extraction and selection for myoelectric control based on wearable EMG sensors,” Sensors (Switzerland), vol. 18, no. 5, pp. 1–17, 2018, doi: 10.3390/s18051615.

B. He, S. Shah, C. Maung, G. Arnold, G. Wan, and H. Schweitzer, “Heuristic search algorithm for dimensionality reduction optimally combining feature selection and feature extraction,” 33rd AAAI Conf. Artif. Intell. AAAI 2019, 31st Innov. Appl. Artif. Intell. Conf. IAAI 2019 9th AAAI Symp. Educ. Adv. Artif. Intell. EAAI 2019, pp. 2280–2287, 2019, doi: 10.1609/aaai.v33i01.33012280.

X. Li, S. H. Ling, and S. Su, “A hybrid feature selection and extraction methods for sleep apnea detection using bio-signals,” Sensors (Switzerland), vol. 20, no. 15, pp. 1–14, 2020, doi: 10.3390/s20154323.

S. Ye, J. Jiang, Z. Zhou, C. Liu, and Y. Liu, “A Fast and Intelligent Open-Circuit Fault Diagnosis Method for a Five-Level NNPP Converter Based on an Improved Feature Extraction and Selection Model,” IEEE Access, vol. 8, pp. 52852–52862, 2020, doi: 10.1109/ACCESS.2020.2981247.

M. Li, H. Wang, L. Yang, Y. Liang, Z. Shang, and H. Wan, “Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction,” Expert Syst. Appl., vol. 150, p. 113277, 2020, doi: 10.1016/j.eswa.2020.113277.

R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction,” J. Appl. Sci. Technol. Trends, vol. 1, no. 2, pp. 56–70, 2020, doi: 10.38094/jastt1224.

S. Wang et al., “Research and Experiment of Radar Signal Support Vector Clustering Sorting Based on Feature Extraction and Feature Selection,” IEEE Access, vol. 8, pp. 93322–93334, 2020, doi: 10.1109/ACCESS.2020.2993270.

T. L. Kei Suzuki, “Constructing an Emotion Estimation Model Based on EEG/HRV Indexes Using Feature Extraction and Feature Selection Algorithms,” 2021.

Priyanka and D. Kumar, “Feature Extraction and Selection of kidney Ultrasound Images Using GLCM and PCA,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 1722–1731, 2020, doi: 10.1016/j.procs.2020.03.382.

U. R. Aparna and S. Paul, “Feature selection and extraction in data mining,” Proc. 2016 Online Int. Conf. Green Eng. Technol. IC-GET 2016, 2017, doi: 10.1109/GET.2016.7916845.

R. J. Urbanowicz, M. Meeker, W. La Cava, R. S. Olson, and J. H. Moore, “Relief-based feature selection : Introduction and review,” J. Biomed. Inform., vol. 85, no. July, pp. 189–203, 2018, doi: 10.1016/j.jbi.2018.07.014.

J. Cai, J. Luo, S. Wang, and S. Yang, “Feature selection in machine learning: A new perspective,” Neurocomputing, vol. 300, pp. 70–79, 2018, doi: 10.1016/j.neucom.2017.11.077.

S. Solorio-Fernández, J. Ariel Carrasco-Ochoa, and J. F. Martínez-Trinidad, “A systematic evaluation of filter Unsupervised Feature Selection methods,” Expert Syst. Appl., vol. 162, 2020, doi: 10.1016/j.eswa.2020.113745.

Z. Liu, R. Wang, N. Japkowicz, Y. Cai, D. Tang, and X. Cai, “Mobile app traffic flow feature extraction and selection for improving classification robustness,” J. Netw. Comput. Appl., vol. 125, pp. 190–208, 2019, doi: 10.1016/j.jnca.2018.10.018.

M. K. Elhadad, K. M. Badran, and G. I. Salama, “A novel approach for ontology-based dimensionality reduction for web text document classification,” Int. J. Softw. Innov., vol. 5, no. 4, pp. 44–58, 2017.

D. A. Zebari, H. Haron, S. R. M. Zeebaree, and D. Q. Zeebaree, “Enhance the Mammogram Images for Both Segmentation and Feature Extraction Using Wavelet Transform,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 100–105, 2019, doi: 10.1109/ICOASE.2019.8723779.

N. Abd-Alsabour, “On the Role of Dimensionality Reduction,” J. Comput., vol. 13, no. 5, pp. 571–579, 2018, doi: 10.17706/jcp.13.5.571-579.

R. Aziz, C. K. Verma, and N. Srivastava, “Dimension reduction methods for microarray data: a review,” AIMS Bioeng., vol. 4, no. 2, pp. 179–197, 2017.

A. S. Eesa, A. M. Abdulazeez, and Z. Orman, “A DIDS Based on The Combination of Cuttlefish Algorithm and Decision Tree,” Sci. J. Univ. Zakho, vol. 5, no. 4, pp. 313–318, 2017.

B. Ghojogh et al., “Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review,” 2019, [Online]. Available: http://arxiv.org/abs/1905.02845

I. De-La-bandera, D. Palacios, J. Mendoza, and R. Barco, “Feature extraction for dimensionality reduction in cellular networks performance analysis,” Sensors (Switzerland), vol. 20, no. 23, pp. 1–10, 2020, doi: 10.3390/s20236944.

K. Gajamannage, R. Paffenroth, and E. M. Bollt, “A non-linear dimensionality reduction framework using smooth geodesics,” Pattern Recognit., vol. 87, no. Xx, pp. 226–236, 2019, doi: 10.1016/j.patcog.2018.10.020.

Features the selection process steps

Downloads

Published

2023-04-03

How to Cite

Haidar Khalid Malik, Nashaat Jasim Al-Anber, & Fuad AbdoEsmail Al- Mekhlafi. (2023). Comparison of Feature Selection and Feature Extraction Role in Dimensionality Reduction of Big Data. Journal of Techniques, 5(1), 184–192. https://doi.org/10.51173/jt.v5i1.1027

Issue

Section

Management

Most read articles by the same author(s)

Similar Articles

<< < 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.