Sub-linear algorithms for shortest unique substring and maximal unique matches

dc.contributor.advisorGanguly, Arnab
dc.contributor.advisorGunawardena, Athula
dc.contributor.advisorNguyen, Hien
dc.contributor.authorSenanayaka, Sanjeewa Bandara
dc.date.accessioned2020-01-15T20:05:38Z
dc.date.available2020-01-15T20:05:38Z
dc.date.issued2019-12
dc.descriptionThis file was last viewed in Adobe Acrobat Pro.en_US
dc.description.abstractLiving in an era where we produce quintillions of data on a daily basis means that there are many intriguing (possibly unrevealed) facts behind them. Some of the data that exists in large quantities is encountered in Computational Biology; our DNA strands are extremely long ( 3.2 billion characters). Genomic problems such as comparing similarities between two strands of DNA or finding the uniqueness between two similar strands of DNA can hugely benefit in the understanding of evolution and possibly create medical advances. A couple of avenues of addressing a problem of similar flavor (known as gene alignment) are using Shortest Unique Substrings and Maximal Unique Matches. Existing techniques for these problems either necessitate the use of high performance computers, or using techniques that have large memory footprints, or use of (theoretical) techniques that are complicated to construct. Prior to our work, Ganguly et al. [TCS’ 17] showed that these problems can be solved in O(n2 log n ) time and O(n=) space for any parameter = !(1) and = o(n). However, these techniques require the use of complex data structures and techniques, which makes them practically intractable. Presented here are sub-linear algorithms for the Shortest Unique Substring problem and the Maximal Unique Matches problem. Besides using minimal amounts of work space and being (reasonably) fast, our algorithms are a combination of two simple techniques – Rabin-Karp fingerprint and Bloom Filters; this makes our work easily amenable to implementations. However, our algorithms suffer from a bounded error probability, which can be reduced using (slightly) more space.en_US
dc.identifier.urihttp://digital.library.wisc.edu/1793/79579
dc.language.isoen_USen_US
dc.publisherUniversity of Wisconsin--Whitewateren_US
dc.subjectBioinformaticsen_US
dc.subjectComputational biologyen_US
dc.subjectGenomicsen_US
dc.subjectData structures (Computer science)en_US
dc.titleSub-linear algorithms for shortest unique substring and maximal unique matchesen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SanjeewaSenanayakaThesis.pdf
Size:
855.83 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.92 KB
Format:
Item-specific license agreed upon to submission
Description: