Predicting Occurrence of the Term Sarcopenia with Semi-Supervised Machine Learning

Loading...
Thumbnail Image

License

DOI

Type

thesis

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

University of Wisconsin-Milwaukee

Abstract

Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition. A variety of machine learning models combined with different features and text processingare tested against training data that mentions the term and test data that is coded for the condition from small datasets from the Medical College of Wisconsin. This research showed that no tested configurations performed exceptionally well, nor combinations of features, based on the F1 score. Still, some models did show promise, especially those classifying with a support vector machine, as well as other classifiers such as decision trees, gradient boosting and logistic regression. Based on this initial research, while some of the ideas and approaches here did not perform great on the data studied, they provide many some insight and paths forward to extend them and apply them on larger and more precise datasets.

Description

Related Material and Data

Citation

Sponsorship

Endorsement

Review

Supplemented By

Referenced By