Badger: An Entropy-Based Web Search Clustering System with Randomization and Voting
Loading...
Files
Date
Authors
Wang, Lidan
Schulze, Chloe Whyte
Advisors
License
DOI
Type
Technical Report
Journal Title
Journal ISSN
Volume Title
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Grantor
Abstract
We have implemented and improved an entropy-based clustering algorithm. In addition to utilizing entropy as a clustering mechanism, our algorithm, Badger, uses randomization and a voting scheme to improve the quality of the resulting clusters. Using parsed web search result snippets, we have tested our algorithm and compared it against EigenCluster, a clustering meta-search engine developed
by a research group at MIT. Our algorithm performs comparably to EigenCluster, but with slightly more overhead due to the extra work of the randomization step.
We have found entropy to be a valid and interesting measure of document similarity and additionally we find it produces cohesive clusters.
Description
Keywords
Related Material and Data
Citation
TR1537