Learning for clinical named entity recognition without manual annotations

Loading...
Thumbnail Image

Advisors

License

DOI

Type

article

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

Background: Named entity recognition (NER) systems are commonly built using supervised methods that use machine learning to learn from corpora manually annotated with named entities. However, manually annotating corpora is very expensive and laborious. Materials and methods: In this paper, a novel method is presented for training clinical NER systems that does not require any manual annotations. It only requires a raw text corpus and a resource like UMLS that can give a list of named entities along with their semantic types. Using these two resources, annotations are automatically obtained to train machine learning methods. The method was evaluated on the NER shared-task datasets of i2b2 2010 and SemEval 2014. Results: On the SemEval 2014 dataset for recognizing diseases and disorders, the method obtained F-measure of 0.693 for exact matching and of 0.773 allowing overlaps. This is comparable to many supervised systems in the past that had used manual annotations for training. On the i2b2 2010 dataset for recognizing problems, tests and treatments, the method obtained F-measures of 0.451, 0.338 and 0.204 respectively for exact matching and of 0.721, 0.587 and 0.475 respectively allowing overlaps. These results are better than an existing unsupervised method. Conclusions: Experiments on standard datasets showed that the new method performed well. The method is general and could be applied to recognize entities of other types on other genres of text without needing manual annotations.

Description

Related Material and Data

Citation

Ghiasvand, O., & Kate, R. J. (2018). Learning for clinical named entity recognition without manual annotations, Informatics in Medicine Unlocked, 2018, 13, 122-127, https://doi.org/10.1016/j.imu.2018.10.011

Sponsorship

Endorsement

Review

Supplemented By

Referenced By