Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Klosterman, Eric James

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

dc.contributor.advisor	Rashmi Prasad
dc.contributor.committeemember	Timothy Patrick
dc.contributor.committeemember	Rohit Kate
dc.creator	Klosterman, Eric James
dc.date.accessioned	2025-01-16T19:44:54Z
dc.date.available	2025-01-16T19:44:54Z
dc.date.issued	2014-12-01
dc.description.abstract	Automatic extraction of patient demographics and psychiatric diagnoses from clinical notes allows for the collection of patient data on a large scale. This data could be used for a variety of research purposes including outcomes studies or developing clinical trials. However, current research has not yet discussed the automatic extraction of demographics and psychiatric diagnoses in detail. The aim of this study is to apply text mining to extract patient demographics - age, gender, marital status, education level, and admission diagnoses from the psychiatric assessments at a mental health hospital and also assign codes to each category. Gender is coded as either Male or Female, marital status is coded as either Single, Married, Divorced, or Widowed, and education level can be coded starting with Some High School through Graduate Degree (PhD/JD/MD etc. Level). Classifications for diagnoses are based on the DSM-IV. For each category, a rule-based approach was developed utilizing keyword-based regular expressions as well as constituency trees and typed dependencies. We employ a two-step approach that first maximizes recall through the development of keyword-based patterns and if necessary, maximizes precision by using NLP-based rules to handle the problem of ambiguity. To develop and evaluate our method, we annotated a corpus of 200 assessments, using a portion of the corpus for developing the method and the rest as a test set. F-score was satisfactory for each category (Age: 0.997; Gender: 0.989; Primary Diagnosis: 0.983; Marital Status: 0.875; Education Level: 0.851) as was coding accuracy (Age: 1.0; Gender: 0.989; Primary Diagnosis: 0.922; Marital Status: 0.889; Education Level: 0.778). These results indicate that a rule-based approach could be considered for extracting these types of information in the psychiatric field. At the same time, the results showed a drop in performance from the development set to the test set, which is partly due to the need for more generality in the rules developed.
dc.identifier.uri	http://digital.library.wisc.edu/1793/88482
dc.relation.replaces	https://dc.uwm.edu/etd/613
dc.subject	Information Extraction
dc.subject	Patient Demographics
dc.subject	Patient Psychiatric Diagnoses
dc.subject	Psychology
dc.subject	Text Mining
dc.title	Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments
dc.type	thesis
thesis.degree.discipline	Health Care Informatics
thesis.degree.grantor	University of Wisconsin-Milwaukee
thesis.degree.name	Master of Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Klosterman_uwm_0263m_10906.pdf
Size:: 1.09 MB
Format:: Adobe Portable Document Format
Description:: Main File

Download

Collections

UW Milwaukee Electronic Theses and Dissertations