An Improved Technique for the Removal and Replacement of the Inconsistencies in Numeric Dataset
No Thumbnail Available
Date
2015-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE Nigeria Chapter.
Abstract
The task of ensuring the removal of anomalies in an unclean numeric dataset, with a view to putting the data in a suitable format
for exploration purposes is a major phase in the data mining process. In the process of exploring an unclean numeric dataset to
unveil their useful patterns or structure, a thorough pre-processing task is inevitable in order to achieve a noise-free dataset. Poor
quality data can be misleading if analysed or used to build models, hence, there is need to remove discrepancies that may be
present in the data prior to exploring them. In this paper, a cleaning algorithm is proposed and implemented in order to remove
the inconsistencies in a numeric dataset. The implementation of the proposed algorithm uses the Java language and the resulting
outputs reveal the efficiency of the proposed approach. In order to evaluate the effectiveness of the proposed algorithm, it is
compared to one of the existing methods based on some metrics. The comparisons show that, the proposed technique is efficient
and can be used as an alternative technique for the removal of outliers in numeric data. This approach is also found to be reliable
as it consistently gives an accurate output that is free of outliers.
Description
Article
Keywords
Data cleansing, Data mining, Outlier detection, Clustering
Citation
African Journal of Computing & ICT