Performance Evaluation Of Manhattan And Euclidean Distance Measures For Clustering Based Automatic Text Summarization.

No Thumbnail Available

Date

2019

Journal Title

Journal ISSN

Volume Title

Publisher

FUOYE Journal of Engineering and Technology

Abstract

In the past few years, there has been an explosion in the amount of text data from a variety of sources. This volume of text is a valuable source of information and knowledge which needs to be effectively summarized to be useful. In this paper, automatic text summarization with K-means clustering techniques is presented by employing two different distance measurement methods (Euclidean and Manhattan). The dataset extracted from African prose was preprocessed using stopwords removal and tokenization. The preprocessed document is converted into vector representation using tf-idf technique and k-means clustering is applied using Euclidean and Manhattan distance measures to generate summary. There are different distance measures for k-means which has been used in several works. However, there is dearth of work on performance evaluation of these distance measures in text summarization. The experimental analysis was performed on Waikato Environment for Knowledge Analysis. The results obtained showed that the Euclidean variation produced an extractive summary of sentences amounting to 72% from three different clusters while the Manhattan variation produced an extractive summary of sentences that made up 94% of the total document all in one cluster using compression ratio as the performance metric.

Description

Keywords

Text summarization,, Euclidean distance,, k-means clustering,, Manhattan distance.

Citation

Collections