Deep Learning for Entity Matching: A Design Space Exploration

dc.contributor.advisor
dc.contributor.authorMudgal Sunil Kumar, Sidharth
dc.date.accessioned2018-05-15T19:44:34Z
dc.date.available2018-05-15T19:44:34Z
dc.date.issued2018-05-15T19:44:34Z
dc.description.abstractEntity matching (EM) finds data instances that refer to the same real-world entity. In this thesis we examine applying deep learning (DL) to EM, to understand DL's benefits and limitations. We review many DL solutions that have been developed for related matching tasks in text processing (e.g., entity linking, textual entailment, etc.). We categorize these solutions and define a space of DL solutions for EM, as embodied by four solutions with varying representational power: SIF, RNN, Attention, and Hybrid. Next, we investigate the types of EM problems for which DL can be helpful. We consider three such problem types, which match structured data instances, textual instances, and dirty instances, respectively. We empirically compare the above four DL solutions with Magellan, a state-of-the-art learning-based EM solution. The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM. For practitioners, this suggests that they should seriously consider using DL for textual and dirty EM problems. We then analyze DL's performance and discuss future research directions. Finally, we present Deepmatcher, a Python package for performing entity matching using deep learning.en
dc.identifier.citationTR1851eng
dc.identifier.urihttp://digital.library.wisc.edu/1793/78379
dc.language.isoen_USen
dc.relation.ispartofseriestech report;TR1851
dc.subjectDeep learningen
dc.subjectEntity Resolutionen
dc.subjectEntity Matchingen
dc.titleDeep Learning for Entity Matching: A Design Space Explorationen
dc.typeTechnical Reporten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TR1851.pdf
Size:
1.84 MB
Format:
Adobe Portable Document Format
Description:
tech report

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: