Skip to main content

A deep learning package for entity matching

Project description

https://travis-ci.org/anhaidgroup/deepmatcher.svg?branch=master https://img.shields.io/badge/License-BSD%203--Clause-blue.svg

DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and utilities that enable you to train and apply state-of-the-art deep learning models for entity matching in less than 10 lines of code. The models are also easily customizable - the modular design allows any subcomponent to be altered or swapped out for a custom implementation.

As an example, given labeled tuple pairs such as the following:

https://raw.githubusercontent.com/anhaidgroup/deepmatcher/master/docs/source/_static/match_input_ex.png

DeepMatcher uses labeled tuple pairs and trains a neural network to perform matching, i.e., to predict match / non-match labels. The trained network can then be used to obtain labels for unlabeled tuple pairs.

Paper and Data

For details on the architecture of the models used, take a look at our paper Deep Learning for Entity Matching (SIGMOD ‘18). All public datasets used in the paper can be downloaded from the datasets page.

Quick Start: DeepMatcher in 30 seconds

There are four main steps in using DeepMatcher:

  1. Data processing: Load and process labeled training, validation and test CSV data.

import deepmatcher as dm
train, validation, test = dm.data.process(path='data_directory',
    train='train.csv', validation='validation.csv', test='test.csv')
  1. Model definition: Specify neural network architecture. Uses the built-in hybrid model (as discussed in section 4.4 of our paper) by default. Can be customized to your heart’s desire.

model = dm.MatchingModel()
  1. Model training: Train neural network.

model.run_train(train, validation, best_save_path='best_model.pth')
  1. Application: Evaluate model on test set and apply to unlabeled data.

model.run_eval(test)

unlabeled = dm.data.process_unlabeled(path='data_directory/unlabeled.csv', trained_model=model)
model.run_prediction(unlabeled)

Installation

We currently support only Python versions 3.5+. Installing using pip is recommended:

pip install deepmatcher

Tutorials

Using DeepMatcher:

  1. Getting Started: A more in-depth guide to help you get familiar with the basics of using DeepMatcher.

  2. Data Processing: Advanced guide on what data processing involves and how to customize it.

  3. Matching Models: Advanced guide on neural network architecture for entity matching and how to customize it.

Entity Matching Workflow:

End to End Entity Matching: A guide to develop a complete entity matching workflow. The tutorial discusses how to use DeepMatcher with Magellan to perform blocking, sampling, labeling and matching to obtain matching tuple pairs from two tables.

DeepMatcher for other matching tasks:

Question Answering with DeepMatcher: A tutorial on how to use DeepMatcher for question answering. Specifically, we will look at WikiQA, a benchmark dataset for the task of Answer Selection.

API Reference

API docs are here.

Support

Take a look at the FAQ for common issues. If you run into any issues or have questions not answered in the FAQ, please file GitHub issues and we will address them asap.

The Team

DeepMatcher was developed by University of Wisconsin-Madison grad students Sidharth Mudgal and Han Li, under the supervision of Prof. AnHai Doan and Prof. Theodoros Rekatsinas.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepmatcher-0.1.2.post2.tar.gz (51.0 kB view details)

Uploaded Source

File details

Details for the file deepmatcher-0.1.2.post2.tar.gz.

File metadata

  • Download URL: deepmatcher-0.1.2.post2.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for deepmatcher-0.1.2.post2.tar.gz
Algorithm Hash digest
SHA256 64f867254ffa7a4a646dbe71ec903fb320d88a0c54b2cf646541031fb5bdc0ef
MD5 209c187d19164ab6c1e8559433b2e71a
BLAKE2b-256 bf6f7916a9880f6562695087de92c50df570cbd678bcfc8792c46cbe24eb9fab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page