Comparative Study of Variable Selection Methods for Genetic Data

Loading...
Thumbnail Image

License

DOI

Type

thesis

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

University of Wisconsin-Milwaukee

Abstract

Association studies for genetic data are essential to understand the genetic basis of complex traits. However, analyzing such high-dimensional data needs suitable feature selection methods. For this reason, we compare three methods, Lasso Regression, Bayesian Lasso Regression, and Ridge Regression combined with significance tests, to identify the most effective method for modeling quantitative trait expression in genetic data. All methods are applied to both simulated and real genetic data and evaluated in terms of various measures of model performance, such as the mean absolute error, the mean squared error, the Akaike information criterion, and the Bayesian information criterion. The results show that all methods perform better than the ordinary least squares model on the prediction of future data. Moreover, the Lasso Regression outperforms all methods in terms of execution time and simplicity of the model, which therefore leads to better interpretability and makes it the best choice for association studies. Overall this thesis provides valuable insights into the strength and limitations of existing feature selection methods for modeling quantitative trait expression and highlights its importance in association studies for genetic data.

Description

Related Material and Data

Citation

Sponsorship

Endorsement

Review

Supplemented By

Referenced By