Statistical Analysis of DNA Sequences Using Overlapping Windows
| dc.contributor.author | Hauth, Amy | en_US |
| dc.contributor.author | Clayton, Murray K. | en_US |
| dc.date.accessioned | 2012-03-15T17:17:05Z | |
| dc.date.available | 2012-03-15T17:17:05Z | |
| dc.date.created | 2000 | en_US |
| dc.date.issued | 2000 | |
| dc.description.abstract | Motivation: Our analysis of DNA sequences uses a k-length, sliding window and considers all overlapping windows along the sequence. The k consecutive nucleotides in a window are called a word or k-word. Statistical analysis of this collection of words often assumes independence between words. Since words can overlap, strict independence is not a valid assumption. We derive a statistic to incorporate both the independent and dependent components of overlapping, k-length words. Results: The expected number of occurrences for a k-word in an N-length sequence is easily calculated given the probabilities of the nucleotides within the word. However, the variance is not straightforward since overlapping occurrences are not independent. We present a derivation of the variance when sequence analysis uses overlapping, k-length windows. The variance can be determined for a word in the entire sequence or at a single position in the sequence. Our analysis assumes that each nucleotide is independent. It does not assume a specific probability of occurrence for each nucleotide. | en_US |
| dc.format.mimetype | application/pdf | en_US |
| dc.identifier.citation | TR1474 | en_US |
| dc.identifier.uri | http://digital.library.wisc.edu/1793/60346 | |
| dc.publisher | University of Wisconsin-Madison Department of Computer Sciences | en_US |
| dc.title | Statistical Analysis of DNA Sequences Using Overlapping Windows | en_US |
| dc.type | Technical Report | en_US |
Files
Original bundle
1 - 1 of 1