Statistical Analysis of DNA Sequences Using Overlapping Windows

dc.contributor.authorHauth, Amyen_US
dc.contributor.authorClayton, Murray K.en_US
dc.date.accessioned2012-03-15T17:17:05Z
dc.date.available2012-03-15T17:17:05Z
dc.date.created2000en_US
dc.date.issued2000
dc.description.abstractMotivation: Our analysis of DNA sequences uses a k-length, sliding window and considers all overlapping windows along the sequence. The k consecutive nucleotides in a window are called a word or k-word. Statistical analysis of this collection of words often assumes independence between words. Since words can overlap, strict independence is not a valid assumption. We derive a statistic to incorporate both the independent and dependent components of overlapping, k-length words. Results: The expected number of occurrences for a k-word in an N-length sequence is easily calculated given the probabilities of the nucleotides within the word. However, the variance is not straightforward since overlapping occurrences are not independent. We present a derivation of the variance when sequence analysis uses overlapping, k-length windows. The variance can be determined for a word in the entire sequence or at a single position in the sequence. Our analysis assumes that each nucleotide is independent. It does not assume a specific probability of occurrence for each nucleotide.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationTR1474en_US
dc.identifier.urihttp://digital.library.wisc.edu/1793/60346
dc.publisherUniversity of Wisconsin-Madison Department of Computer Sciencesen_US
dc.titleStatistical Analysis of DNA Sequences Using Overlapping Windowsen_US
dc.typeTechnical Reporten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TR1474.pdf
Size:
716.18 KB
Format:
Adobe Portable Document Format