Anomaly Detection in Dataset for Improved Model Accuracy Using DBSCAN Clustering Algorithm

No Thumbnail Available

Date

2015-03

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE Nigeria Chapter.

Abstract

The purity of the dataset used for model construction plays important roles in the accuracy and reliability of model building; outliers are often caused by noisy data as a result of mechanical faults, changes in system behaviour, or due to human error. This is why it is essential to pre-process dataset prior to modelling, in order to differentiate between data that appears normal or abnormal within the sample space. One important reason for removing outliers is to prevent contaminating effect on the dataset which can lead to bad consequences and serious disaster if not removed. An effective measure that automatically clusters outliers in the dataset using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique is proposed in this paper. Rapidminer, an open source software tool is used to experiment on some sample dataset and based on the characteristics of these data objects, some clusters are formed which filter out outliers from the dataset being explored. The experimental results from this study show that, the DBSCAN algorithm is a suitable technique for outliers detection and capable of filtering the abnormal data from a combination of noise and normal dataset.

Description

Main article

Keywords

Anomaly detection, DBSCAN, clustering, model-building, algorithm, noisy data

Citation

African Journal of Computing & ICT

Collections