An Improved Technique for the Removal and Replacement of the Inconsistencies in Numeric Dataset

No Thumbnail Available

Date

2015-05

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE Nigeria Chapter.

Abstract

The task of ensuring the removal of anomalies in an unclean numeric dataset, with a view to putting the data in a suitable format for exploration purposes is a major phase in the data mining process. In the process of exploring an unclean numeric dataset to unveil their useful patterns or structure, a thorough pre-processing task is inevitable in order to achieve a noise-free dataset. Poor quality data can be misleading if analysed or used to build models, hence, there is need to remove discrepancies that may be present in the data prior to exploring them. In this paper, a cleaning algorithm is proposed and implemented in order to remove the inconsistencies in a numeric dataset. The implementation of the proposed algorithm uses the Java language and the resulting outputs reveal the efficiency of the proposed approach. In order to evaluate the effectiveness of the proposed algorithm, it is compared to one of the existing methods based on some metrics. The comparisons show that, the proposed technique is efficient and can be used as an alternative technique for the removal of outliers in numeric data. This approach is also found to be reliable as it consistently gives an accurate output that is free of outliers.

Description

Article

Keywords

Data cleansing, Data mining, Outlier detection, Clustering

Citation

African Journal of Computing & ICT

Collections