Performance Evaluation Of Manhattan And Euclidean Distance Measures For Clustering Based Automatic Text Summarization.

dc.contributor.authorSalihu, S.A.
dc.contributor.authorOnyekwere, I.P.,
dc.contributor.authorMabayoje, M.A.
dc.contributor.authorMojeed, H.A.
dc.date.accessioned2020-01-24T12:15:35Z
dc.date.available2020-01-24T12:15:35Z
dc.date.issued2019
dc.description.abstractIn the past few years, there has been an explosion in the amount of text data from a variety of sources. This volume of text is a valuable source of information and knowledge which needs to be effectively summarized to be useful. In this paper, automatic text summarization with K-means clustering techniques is presented by employing two different distance measurement methods (Euclidean and Manhattan). The dataset extracted from African prose was preprocessed using stopwords removal and tokenization. The preprocessed document is converted into vector representation using tf-idf technique and k-means clustering is applied using Euclidean and Manhattan distance measures to generate summary. There are different distance measures for k-means which has been used in several works. However, there is dearth of work on performance evaluation of these distance measures in text summarization. The experimental analysis was performed on Waikato Environment for Knowledge Analysis. The results obtained showed that the Euclidean variation produced an extractive summary of sentences amounting to 72% from three different clusters while the Manhattan variation produced an extractive summary of sentences that made up 94% of the total document all in one cluster using compression ratio as the performance metric.en_US
dc.identifier.urihttp://hdl.handle.net/123456789/3573
dc.language.isoenen_US
dc.publisherFUOYE Journal of Engineering and Technologyen_US
dc.relation.ispartofseries4;1
dc.subjectText summarization,en_US
dc.subjectEuclidean distance,en_US
dc.subjectk-means clustering,en_US
dc.subjectManhattan distance.en_US
dc.titlePerformance Evaluation Of Manhattan And Euclidean Distance Measures For Clustering Based Automatic Text Summarization.en_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
performance evaluation of mahattan.pdf
Size:
1.04 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections