Efficient Computation of K-Nearest Neighbor Graphs for Large High-Dimensional Data Sets on GPU Clusters

dc.contributor.advisorRoshan M. D'Souza
dc.contributor.committeememberAbbas Ourmazd
dc.contributor.committeememberHossein Hosseini
dc.creatorDashti, Ali
dc.date.accessioned2025-01-16T18:41:31Z
dc.date.available2025-01-16T18:41:31Z
dc.date.issued2013-08-01
dc.description.abstractThe k-Nearest Neighbor Graph (k-NNG) and the related k-Nearest Neighbor (k-NN) methods have a wide variety of applications in areas such as bioinformatics, machine learning, data mining, clustering analysis, and pattern recognition. Our application of interest is manifold embedding. Due to the large dimensionality of the input data ( This thesis presents the development and implementation of a distributed exact k-Nearest Neighbor Graph (k-NNG) construction method. The proposed method uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism for distributed computational systems using GPUs. It is scalable for different cluster sizes, with each compute node in the cluster containing multiple GPUs. The distance computation is formulated as a basic matrix multiplication and reduction operation. The optimized CUBLAS matrix multiplication library is used for this purpose. Various distance metrics such as Euclidian, cosine, and Pearson are supported. For k-NNG construction, two different methods are presented. The first is based on an approach called batch index sorting to build the k-NNG with three sorting operations. This method uses the optimized radix sort implementation in the Thrust library for GPU. The second is an efficient implementation using the latest GPU functionalities of a variant of the quick select algorithm. Overall, the batch index sorting based k-NNG method is approximately 13x faster than a distributed MATLAB implementation. The quick select algorithm itself has a 5x speedup over state-of-the art GPU methods. This has enabled the processing of k-NNG construction on a data set containing 20 million image vectors, each with dimension 15,000, as part of a manifold embedding technique for analyzing the conformations of biomolecules.
dc.identifier.urihttp://digital.library.wisc.edu/1793/87267
dc.relation.replaceshttps://dc.uwm.edu/etd/280
dc.subjectDiffusion Map
dc.subjectGPU
dc.subjectHigh Performance Computing
dc.subjectK Nearest Neighbors
dc.subjectManifold Embedding
dc.titleEfficient Computation of K-Nearest Neighbor Graphs for Large High-Dimensional Data Sets on GPU Clusters
dc.typethesis
thesis.degree.disciplineEngineering
thesis.degree.grantorUniversity of Wisconsin-Milwaukee
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dashti_uwm_0263m_10325.pdf
Size:
3.99 MB
Format:
Adobe Portable Document Format
Description:
Main File