Anomaly Detection in Dataset for Improved Model Accuracy Using DBSCAN Clustering Algorithm
No Thumbnail Available
Date
2015-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE Nigeria Chapter.
Abstract
The purity of the dataset used for model construction plays important roles in the accuracy and reliability of model building;
outliers are often caused by noisy data as a result of mechanical faults, changes in system behaviour, or due to human error. This
is why it is essential to pre-process dataset prior to modelling, in order to differentiate between data that appears normal or
abnormal within the sample space. One important reason for removing outliers is to prevent contaminating effect on the dataset
which can lead to bad consequences and serious disaster if not removed. An effective measure that automatically clusters outliers
in the dataset using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique is proposed in this paper.
Rapidminer, an open source software tool is used to experiment on some sample dataset and based on the characteristics of these
data objects, some clusters are formed which filter out outliers from the dataset being explored. The experimental results from
this study show that, the DBSCAN algorithm is a suitable technique for outliers detection and capable of filtering the abnormal
data from a combination of noise and normal dataset.
Description
Main article
Keywords
Anomaly detection, DBSCAN, clustering, model-building, algorithm, noisy data
Citation
African Journal of Computing & ICT