ML_aided_RecordLinkage
In this project we investigate some possible Machine Learning applications to Record Linkage (and Data deduplication), in order to figure out their viability.
Project maintained by frenkowski
Hosted on GitHub Pages — Theme by mattgraham
Machine Learning aided Record Linkage
Group Components
- Francesco Porto f.porto2@campus.unimib.it (816042)
- Francesco Stranieri f.stranieri1@campus.unimib.it (816551)
- Mattia Vincenzi m.vincenzi14@campus.unimib.it (860579)
Abstract
Record Linkage is the process of finding records in one or more datasets that refer to the same entity across different data sources. Traditionally, it is done by applying comparison rules between pairs of attributes from each dataset. In this project we investigate some possible Machine Learning applications to Record Linkage (and Data deduplication), in order to figure out their viability.
Project structure
We provide:
- A Jupyter Notebook containing our project (code + step by step comments and explaination);
- A PDF relation we obtained from the notebook (we recommend just using the notebook since it might be easier to read);
- The slides to be shown during the project presentation;
- The datasets used are integrated into the library and therefore not provided, we give an in-depth description for each one in the notebook.