Statistical Analysis of DNA Sequences Using Overlapping Windows

Hauth, Amy; Clayton, Murray K.

Statistical Analysis of DNA Sequences Using Overlapping Windows

dc.contributor.author	Hauth, Amy	en_US
dc.contributor.author	Clayton, Murray K.	en_US
dc.date.accessioned	2012-03-15T17:17:05Z
dc.date.available	2012-03-15T17:17:05Z
dc.date.created	2000	en_US
dc.date.issued	2000
dc.description.abstract	Motivation: Our analysis of DNA sequences uses a k-length, sliding window and considers all overlapping windows along the sequence. The k consecutive nucleotides in a window are called a word or k-word. Statistical analysis of this collection of words often assumes independence between words. Since words can overlap, strict independence is not a valid assumption. We derive a statistic to incorporate both the independent and dependent components of overlapping, k-length words. Results: The expected number of occurrences for a k-word in an N-length sequence is easily calculated given the probabilities of the nucleotides within the word. However, the variance is not straightforward since overlapping occurrences are not independent. We present a derivation of the variance when sequence analysis uses overlapping, k-length windows. The variance can be determined for a word in the entire sequence or at a single position in the sequence. Our analysis assumes that each nucleotide is independent. It does not assume a specific probability of occurrence for each nucleotide.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	TR1474	en_US
dc.identifier.uri	http://digital.library.wisc.edu/1793/60346
dc.publisher	University of Wisconsin-Madison Department of Computer Sciences	en_US
dc.title	Statistical Analysis of DNA Sequences Using Overlapping Windows	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: TR1474.pdf
Size:: 716.18 KB
Format:: Adobe Portable Document Format

Download

Collections

CS Technical Reports