Badger: An Entropy-Based Web Search Clustering System with Randomization and Voting

Loading...
Thumbnail Image

Date

Authors

Wang, Lidan
Schulze, Chloe Whyte

Advisors

License

DOI

Type

Technical Report

Journal Title

Journal ISSN

Volume Title

Publisher

University of Wisconsin-Madison Department of Computer Sciences

Grantor

Abstract

We have implemented and improved an entropy-based clustering algorithm. In addition to utilizing entropy as a clustering mechanism, our algorithm, Badger, uses randomization and a voting scheme to improve the quality of the resulting clusters. Using parsed web search result snippets, we have tested our algorithm and compared it against EigenCluster, a clustering meta-search engine developed by a research group at MIT. Our algorithm performs comparably to EigenCluster, but with slightly more overhead due to the extra work of the randomization step. We have found entropy to be a valid and interesting measure of document similarity and additionally we find it produces cohesive clusters.

Description

Keywords

Related Material and Data

Citation

TR1537

Sponsorship

Endorsement

Review

Supplemented By

Referenced By