A deep learning package for entity matching
Project description
##################
DeepMatcher
##################
.. image:: https://travis-ci.org/sidharthms/deepmatcher.svg?branch=master
:target: https://travis-ci.org/sidharthms/deepmatcher
.. image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg
:target: https://opensource.org/licenses/BSD-3-Clause
DeepMatcher is a python package for performing entity / text matching using deep learning.
It provides built-in neural networks and utilities that enable you to train and apply
state-of-the-art deep learning models for entity matching in less than 10 lines of code.
The models are also easily customizable - the modular design allows any subcomponent to be
altered or swapped out for a custom implementation.
As an example, given labeled tuple pairs such as the following:
.. image:: docs/source/_static/match_input_ex.png
DeepMatcher uses labeled tuple pairs trains a neural network to perform matching, i.e., to
predict match / non-match labels. The trained network can then be used obtain labels for
unlabeled tuple pairs or text sequences.
For details on the architecture of the models used, take a look at our paper `Deep
Learning for Entity Matching`_ (SIGMOD '18). All the publicly available datasets used in
the paper can be found at `Prof. AnHai Doan's data repository`_.
**********
Quick Start: DeepMatcher in 30 seconds
**********
There are four main steps in using DeepMatcher:
1. Data processing: Load and process labeled training, validation and test CSV data.
.. code-block:: python
import deepmatcher as dm
train, validation, test = dm.data.process(path='data_directory',
train='train.csv', validation='validation.csv', test='test.csv')
2. Model definition: Specify neural network architecture. Uses a built-in architecture by
default. Can be customized to your heart's desire.
.. code-block:: python
model = dm.MatchingModel()
3. Model training: Train neural network.
.. code-block:: python
model.run_train(train, validation, best_save_path='hybrid_model.pth')
4. Application: Evaluate model on test set and apply to unlabeled data.
.. code-block:: python
model.run_eval(test)
unlabeled = dm.data.process_unlabeled(path='data_directory/unlabeled.csv', trained_model=model)
model.run_prediction(unlabeled)
**********
Installation
**********
Installing directly from GitHub is recommended:
.. code-block:: none
pip install --process-dependency-links git+https://github.com/sidharthms/deepmatcher
**********
Tutorials
**********
**Using DeepMatcher:**
1. `Getting Started`_: A more in-depth guide to help you get familiar with the basics of
using DeepMatcher.
2. `Data Processing`_: Advanced guide on what data processing involves and how to
customize it.
3. `Matching Models`_: Advanced guide on neural network architecture for entity matching
and how to customize it.
**Entity Matching Workflow:**
`End to End Entity Matching`_: A guide to develop a complete entity
matching workflow. The tutorial discusses how to use DeepMatcher with `Magellan`_ to
perform blocking, sampling, labeling and matching to obtain matching tuple pairs from two
tables.
**DeepMatcher for other matching tasks:**
`Question Answering with DeepMatcher`_: A tutorial on how to use DeepMatcher for question
answering. Specifically, we will look at `WikiQA`_, a benchmark dataset for the task of
Answer Selection.
**********
API Reference
**********
API docs `are here`_.
**********
Support
**********
This package is under active development. If you run into any issues or have questions,
please `file GitHub issues`_.
**********
The Team
**********
DeepMatcher was developed by University of Wisconsin-Madison grad students Sidharth Mudgal
and Han Li, under the supervision of Prof. AnHai Doan and Prof. Theodoros Rekatsinas.
.. _`Deep Learning for Entity Matching`: http://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf
.. _`Prof. AnHai Doan's data repository`: https://sites.google.com/site/anhaidgroup/useful-stuff/data
.. _`Magellan`: https://sites.google.com/site/anhaidgroup/projects/magellan
.. _`Getting Started`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/getting_started.ipynb
.. _`Data Processing`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/data_processing.ipynb
.. _`Matching Models`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/matching_models.ipynb
.. _`End to End Entity Matching`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/end_to_end_em.ipynb
.. _`are here`: https://deepmatcher.github.io/docs/
.. _`Question Answering with DeepMatcher`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/question_answering.ipynb
.. _`WikiQA`: https://aclweb.org/anthology/D15-1237
.. _`file GitHub issues`: https://github.com/sidharthms/deepmatcher/issues
DeepMatcher
##################
.. image:: https://travis-ci.org/sidharthms/deepmatcher.svg?branch=master
:target: https://travis-ci.org/sidharthms/deepmatcher
.. image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg
:target: https://opensource.org/licenses/BSD-3-Clause
DeepMatcher is a python package for performing entity / text matching using deep learning.
It provides built-in neural networks and utilities that enable you to train and apply
state-of-the-art deep learning models for entity matching in less than 10 lines of code.
The models are also easily customizable - the modular design allows any subcomponent to be
altered or swapped out for a custom implementation.
As an example, given labeled tuple pairs such as the following:
.. image:: docs/source/_static/match_input_ex.png
DeepMatcher uses labeled tuple pairs trains a neural network to perform matching, i.e., to
predict match / non-match labels. The trained network can then be used obtain labels for
unlabeled tuple pairs or text sequences.
For details on the architecture of the models used, take a look at our paper `Deep
Learning for Entity Matching`_ (SIGMOD '18). All the publicly available datasets used in
the paper can be found at `Prof. AnHai Doan's data repository`_.
**********
Quick Start: DeepMatcher in 30 seconds
**********
There are four main steps in using DeepMatcher:
1. Data processing: Load and process labeled training, validation and test CSV data.
.. code-block:: python
import deepmatcher as dm
train, validation, test = dm.data.process(path='data_directory',
train='train.csv', validation='validation.csv', test='test.csv')
2. Model definition: Specify neural network architecture. Uses a built-in architecture by
default. Can be customized to your heart's desire.
.. code-block:: python
model = dm.MatchingModel()
3. Model training: Train neural network.
.. code-block:: python
model.run_train(train, validation, best_save_path='hybrid_model.pth')
4. Application: Evaluate model on test set and apply to unlabeled data.
.. code-block:: python
model.run_eval(test)
unlabeled = dm.data.process_unlabeled(path='data_directory/unlabeled.csv', trained_model=model)
model.run_prediction(unlabeled)
**********
Installation
**********
Installing directly from GitHub is recommended:
.. code-block:: none
pip install --process-dependency-links git+https://github.com/sidharthms/deepmatcher
**********
Tutorials
**********
**Using DeepMatcher:**
1. `Getting Started`_: A more in-depth guide to help you get familiar with the basics of
using DeepMatcher.
2. `Data Processing`_: Advanced guide on what data processing involves and how to
customize it.
3. `Matching Models`_: Advanced guide on neural network architecture for entity matching
and how to customize it.
**Entity Matching Workflow:**
`End to End Entity Matching`_: A guide to develop a complete entity
matching workflow. The tutorial discusses how to use DeepMatcher with `Magellan`_ to
perform blocking, sampling, labeling and matching to obtain matching tuple pairs from two
tables.
**DeepMatcher for other matching tasks:**
`Question Answering with DeepMatcher`_: A tutorial on how to use DeepMatcher for question
answering. Specifically, we will look at `WikiQA`_, a benchmark dataset for the task of
Answer Selection.
**********
API Reference
**********
API docs `are here`_.
**********
Support
**********
This package is under active development. If you run into any issues or have questions,
please `file GitHub issues`_.
**********
The Team
**********
DeepMatcher was developed by University of Wisconsin-Madison grad students Sidharth Mudgal
and Han Li, under the supervision of Prof. AnHai Doan and Prof. Theodoros Rekatsinas.
.. _`Deep Learning for Entity Matching`: http://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf
.. _`Prof. AnHai Doan's data repository`: https://sites.google.com/site/anhaidgroup/useful-stuff/data
.. _`Magellan`: https://sites.google.com/site/anhaidgroup/projects/magellan
.. _`Getting Started`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/getting_started.ipynb
.. _`Data Processing`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/data_processing.ipynb
.. _`Matching Models`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/matching_models.ipynb
.. _`End to End Entity Matching`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/end_to_end_em.ipynb
.. _`are here`: https://deepmatcher.github.io/docs/
.. _`Question Answering with DeepMatcher`: https://nbviewer.jupyter.org/github/sidharthms/deepmatcher/blob/master/examples/question_answering.ipynb
.. _`WikiQA`: https://aclweb.org/anthology/D15-1237
.. _`file GitHub issues`: https://github.com/sidharthms/deepmatcher/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
deepmatcher-0.1.0rc1.tar.gz
(51.1 kB
view hashes)