Fast Ways to Detect Outliers

Authors

  • Emad Obaid Merza Information Technology Department, Technical College of Management-Baghdad Middle Technical University, Baghdad, Iraq.
  • Nashaat Jasim Mohammed Information Technology Department, Technical College of Management-Baghdad Middle Technical University, Baghdad, Iraq.

DOI:

https://doi.org/10.51173/jt.v3i1.287

Keywords:

outlier, outlier detection, big data, normal distribution, Z-Score, Hample's test

Abstract

The occurrence of tremendous developments in the field of data has led to the formation of huge volumes of data, and it is normal that this leads to the presence of outliers in this data for many reasons, which may have small or large values ​​compared to the rest of the normal data, and the presence of outliers in the data affects the statistical analysis of this data, so we must try to reduce its impact in various ways. On the other hand, the presence of outliers ​​may be of great benefit, for example knowledge of geological activities that precede natural disasters such as (earthquakes, forest fires, floods ... etc.). Therefore, detection of outliers is of great importance in various fields. In this research, we aim to develop easy methods for detecting outliers in big data, as the problem that this research addresses is that many of the newly developed methods for detecting outliers suffer from computational complexity or are efficient when the sample size is small. An experimental approach was used in this research by suggesting three methods for detecting outliers, the first method is based on standard deviation and was tested and compared with the normal distribution method and the z-score method. The second method depends on the maximum and minimum value of the data, and the third method depends on the range between successive data points. The results of second and third methods are compared with Hample's Test method result. The accuracy of the results is measured based on the confusion matrix. The results of the proposed methods test showed the conformity of the first method with the results of the normal distribution method and the Z-Score method, as well as the superiority of the third method over the Hample's test method. In this paper, it was concluded that the Hample's test method suffers from a serious weakness when the zero values in the data constitute more than 50% of the number of elements.

Downloads

Download data is not yet available.

Author Biography

Nashaat Jasim Mohammed, Information Technology Department, Technical College of Management-Baghdad Middle Technical University, Baghdad, Iraq.

Assist.Prof. Dr. Nashaat Jasim Mohammed Anber

Information Technology Department, Technical College of Management- Baghdad

Middle Technical University, Baghdad, Iraq

 

Downloads

Published

2021-03-30

How to Cite

Merza, E. O., & Mohammed, N. J. (2021). Fast Ways to Detect Outliers. Journal of Techniques, 3(1), 66–73. https://doi.org/10.51173/jt.v3i1.287

Issue

Section

definition

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.