GEORAC: an RNA-seq Atlas Constructor for the Gene Expression Omnibus

dc.contributor.advisorTimothy B Patrick
dc.contributor.advisorRebecca D Klaper
dc.contributor.committeememberMichael J Carvan
dc.contributor.committeememberSusan W McRoy
dc.contributor.committeememberElizabeth A Worthey
dc.creatorMohaimani, Aurash A
dc.date.accessioned2025-01-16T18:10:18Z
dc.date.issued2018-05-01
dc.description.abstractThe meteoric rise of next-generation sequencing technologies over the past 15 years has resulted in a voluminous amount of data generated by modern biological and clinical studies. RNA sequencing, colloquially referred to as RNA-Seq, is a next-generation approach capable of surveying and quantifying whole organism transcriptomes. RNA-Seq methods are valued over microarray assays for their ability to avoid cross-hybridization signal noise, to quantify gene or transcript expression without assay-specific upper limits, to natively provide single-nucleotide genomic resolution, and to allow for de novo transcriptome assemblies. Many thousands of RNA-Seq studies have been published over the past seven years, and a significant area of bioinformatics research has focused on the creation of atlases that aggregate RNA-Seq results. These atlases are crucially useful for surveying trends in gene expression across published studies, for inspecting potentially contentious claims made by novel or prior work, and for synthesizing future research directions. The Expression Atlas currently serves as the canonical example for an RNA-Seq atlas and presents results from over 3,000 studies across numerous model research organisms. An issue with the Expression Atlas is that it forcibly applies a uniform secondary re-analysis pipeline to each RNA-Seq study incorporated within its database; this approach presents a conceptual challenge to studies whose results have been generated and published using established, well-tested workflows. Thus, there exists a critical need to provide for construction of RNA-Seq atlases that precisely reflect original results presented within the literature, and the primary objective of this dissertation is to provide a workflow that allows for transparent, reproducible construction of RNA-Seq atlases from study meta- and expression data housed within the National Center for Biomedical Information’s Gene Expression Omnibus (GEO). The challenge of this goal is exacerbated by the highly flexible design of GEO, which allows researchers to define novel metadata attributes and values at will and to submit expression results in virtually any format. Following an introductory background into modern genomics and RNA-Seq, the second chapter of this work presents GEOMP, a metadata parser and relational database constructor for the Gene Expression Omnibus. The subsequent third chapter describes GEOMP2, an in-place augmentation of GEOMP that provides further atomization and loading of sample-specific characteristics tags; this chapter significantly presents results from a pilot study surveying bioinformatics methods reproducibility across the zebrafish, mouse, and human research communities using metadata parsed and output by GEOMP2. Chapter four details GEORGET, a pipeline designed to rehabilitate, translate, and load expression data pulled from GEO into the relational database store constructed by GEOMP2. Chapter five concludes with discussion of future directions needed to expand and improve upon the current GEORAC workflow and the associated methods reproducibility study.
dc.description.embargo2020-05-21
dc.embargo.liftdate2020-05-21
dc.identifier.urihttp://digital.library.wisc.edu/1793/86241
dc.relation.replaceshttps://dc.uwm.edu/etd/1875
dc.subjectGEO
dc.subjectMetadata
dc.subjectRelational database
dc.subjectRNA-Seq
dc.titleGEORAC: an RNA-seq Atlas Constructor for the Gene Expression Omnibus
dc.typedissertation
thesis.degree.disciplineBiomedical and Health Informatics
thesis.degree.grantorUniversity of Wisconsin-Milwaukee
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Mohaimani_uwm_0263D_12134.pdf
Size:
1.38 MB
Format:
Adobe Portable Document Format
Description:
Main File
Loading...
Thumbnail Image
Name:
GEOMP2_methods.xlsx
Size:
161.94 KB
Format:
Microsoft Excel XML
Description: