Sub-linear algorithms for shortest unique substring and maximal unique matches

Senanayaka, Sanjeewa Bandara

Sub-linear algorithms for shortest unique substring and maximal unique matches

dc.contributor.advisor	Ganguly, Arnab
dc.contributor.advisor	Gunawardena, Athula
dc.contributor.advisor	Nguyen, Hien
dc.contributor.author	Senanayaka, Sanjeewa Bandara
dc.date.accessioned	2020-01-15T20:05:38Z
dc.date.available	2020-01-15T20:05:38Z
dc.date.issued	2019-12
dc.description	This file was last viewed in Adobe Acrobat Pro.	en_US
dc.description.abstract	Living in an era where we produce quintillions of data on a daily basis means that there are many intriguing (possibly unrevealed) facts behind them. Some of the data that exists in large quantities is encountered in Computational Biology; our DNA strands are extremely long ( 3.2 billion characters). Genomic problems such as comparing similarities between two strands of DNA or finding the uniqueness between two similar strands of DNA can hugely benefit in the understanding of evolution and possibly create medical advances. A couple of avenues of addressing a problem of similar flavor (known as gene alignment) are using Shortest Unique Substrings and Maximal Unique Matches. Existing techniques for these problems either necessitate the use of high performance computers, or using techniques that have large memory footprints, or use of (theoretical) techniques that are complicated to construct. Prior to our work, Ganguly et al. [TCS’ 17] showed that these problems can be solved in O(n2 log n ) time and O(n=) space for any parameter = !(1) and = o(n). However, these techniques require the use of complex data structures and techniques, which makes them practically intractable. Presented here are sub-linear algorithms for the Shortest Unique Substring problem and the Maximal Unique Matches problem. Besides using minimal amounts of work space and being (reasonably) fast, our algorithms are a combination of two simple techniques – Rabin-Karp fingerprint and Bloom Filters; this makes our work easily amenable to implementations. However, our algorithms suffer from a bounded error probability, which can be reduced using (slightly) more space.	en_US
dc.identifier.uri	http://digital.library.wisc.edu/1793/79579
dc.language.iso	en_US	en_US
dc.publisher	University of Wisconsin--Whitewater	en_US
dc.subject	Bioinformatics	en_US
dc.subject	Computational biology	en_US
dc.subject	Genomics	en_US
dc.subject	Data structures (Computer science)	en_US
dc.title	Sub-linear algorithms for shortest unique substring and maximal unique matches	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SanjeewaSenanayakaThesis.pdf
Size:: 855.83 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.92 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Master's Theses--UW-Whitewater