Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks

Carthon, Mark Anthony

Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks

Files

Primary Carthon_uwm_0263M_12802.pdf (869.95 KB)

Date

2020-08-01

Authors

Carthon, Mark Anthony

Advisors

Istvan Lauko

Type

thesis

Grantor

University of Wisconsin-Milwaukee

Abstract

Recent advances in natural language processing technology have led to the emergence of large and deep pre-trained neural networks. The use and focus of these networks are on transfer learning. More specifically, retraining or fine-tuning such pre-trained networks to achieve state of the art performance in a variety of challenging natural language processing/understanding (NLP/NLU) tasks. In this thesis, we focus on identifying paraphrases at the sentence level using the network Bidirectional Encoder Representations from Transformers (BERT). It is well understood that in deep learning the volume and quality of training data is a determining factor of performance. The objective of this thesis is to develop a methodology for algorithmic generation of high-quality training data for paraphrasing task, an important NLU task, as well as the evaluation of the resulting training data on fine-tuning BERT to identify paraphrases. Here we will focus on elementary adverbial paraphrases, but the methodology extends to the general case. In this work, training data for adverbial paraphrasing was generated utilizing an Oxfordiii synonym dictionary, and we used the generated data to re-train BERT for the paraphrasing task with strong results, achieving a validation accuracy of 96.875%.

Keywords

BERT, Carthon, Machine Learning, Math, NLP, NLU

URI

http://digital.library.wisc.edu/1793/86907

Collections

UW Milwaukee Electronic Theses and Dissertations

Full item page

Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks

Files

Date

Authors

Advisors

License

DOI

Type

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

Description

Keywords

Related Material and Data

Citation

Sponsorship

URI

Collections

Endorsement

Review

Supplemented By

Referenced By