Facilitating the design, comparison and sharingof deep text matching models.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
License
- OSI Approved :: Apache Software License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

MatchZoo-py

PyTorch version of MatchZoo.

Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo 是一个通用的文本匹配工具包，它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。

The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use.

Tasks	Text 1	Text 2	Objective
Paraphrase Indentification	string 1	string 2	classification
Textual Entailment	text	hypothesis	classification
Question Answer	question	answer	classification/ranking
Conversation	dialog	response	classification/ranking
Information Retrieval	query	document	ranking

Get Started in 60 Seconds

To train a Deep Semantic Structured Model, make use of MatchZoo customized loss functions and evaluation metrics to define a task:

import torch
import matchzoo as mz

ranking_task = mz.tasks.Ranking(losses=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
    mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
    mz.metrics.MeanAveragePrecision()
]

Prepare input data:

train_pack = mz.datasets.wiki_qa.load_data('train', task=ranking_task)
valid_pack = mz.datasets.wiki_qa.load_data('dev', task=ranking_task)

Preprocess your input data in three lines of code, keep track parameters to be passed into the model:

preprocessor = mz.models.DSSM.get_default_preprocessor()
train_processed = preprocessor.fit_transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)

Generate pair-wise training data on-the-fly:

trainset = mz.dataloader.Dataset(
    data_pack=train_processed,
    mode='pair',
    num_dup=1,
    num_neg=4
)
validset = mz.dataloader.Dataset(
    data_pack=valid_processed,
    mode='point'
)

Define padding callback and generate data loader:

padding_callback = mz.models.DSSM.get_default_padding_callback()

trainloader = mz.dataloader.DataLoader(
    dataset=trainset,
    batch_size=32,
    stage='train',
    callback=padding_callback
)
validloader = mz.dataloader.DataLoader(
    dataset=validset,
    batch_size=32,
    stage='dev',
    callback=padding_callback
)

Initialize the model, fine-tune the hyper-parameters:

model = mz.models.DSSM()
model.params['task'] = ranking_task
model.params['vocab_size'] = preprocessor.context['vocab_size']
model.guess_and_fill_missing_params()
model.build()

Trainer is used to control the training flow:

optimizer = torch.optim.Adam(model.parameters())

trainer = mz.trainers.Trainer(
    model=model,
    optimizer=optimizer,
    trainloader=trainloader,
    validloader=validloader,
    epochs=10
)

trainer.run()

References

Tutorials

English Documentation

If you're interested in the cutting-edge research progress, please take a look at awaresome neural models for semantic match.

Install

MatchZoo is dependent on PyTorch. Two ways to install MatchZoo-py:

Install MatchZoo-py from Pypi:

pip install matchzoo-py

Install MatchZoo-py from the Github source:

git clone https://github.com/NTMC-Community/MatchZoo-py.git
cd MatchZoo-py
python setup.py install

Models

DRMM: this model is an implementation of A Deep Relevance Matching Model for Ad-hoc Retrieval.
DRMMTKS: this model is an implementation of A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval.
ARC-I: this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences
ARC-II: this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences
DSSM: this model is an implementation of Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
CDSSM: this model is an implementation of Learning Semantic Representations Using Convolutional Neural Networks for Web Search
MatchLSTM:this model is an implementation of Machine Comprehension Using Match-LSTM and Answer Pointer
DUET: this model is an implementation of Learning to Match Using Local and Distributed Representations of Text for Web Search
KNRM: this model is an implementation of End-to-End Neural Ad-hoc Ranking with Kernel Pooling
ConvKNRM: this model is an implementation of Convolutional neural networks for soft-matching n-grams in ad-hoc search
ESIM: this model is an implementation of Enhanced LSTM for Natural Language Inference
BiMPM: this model is an implementation of Bilateral Multi-Perspective Matching for Natural Language Sentences
Models under development: MatchPyramid, Match-SRNN, DeepRank, aNMM ....

Citation

If you use MatchZoo in your research, please use the following BibTex entry.

@inproceedings{Guo:2019:MLP:3331184.3331403,
 author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
 title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
 booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR'19},
 year = {2019},
 isbn = {978-1-4503-6172-9},
 location = {Paris, France},
 pages = {1297--1300},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3331184.3331403},
 doi = {10.1145/3331184.3331403},
 acmid = {3331403},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {matchzoo, neural network, text matching},
}

Development Team

Yixing Fan

Core Dev
ASST PROF, ICT

Jiangui Chen

Core Dev
PhD. ICT

Yinqiong Cai

Core Dev
M.S. ICT

Liang Pang

Core Dev
ASST PROF, ICT

Lixin Su

Dev
PhD. ICT

Junfeng Tian

Dev
M.S. ECNU

Qinghua Wang

Documentation
B.S. Shandong Univ.

Contribution

Please make sure to read the Contributing Guide before creating a pull request. If you have a MatchZoo-related paper/project/compnent/tool, send a pull request to this awesome list!

Thank you to all the people who already contributed to MatchZoo!

Bo Wang, Zeyi Wang, Liu Yang, Zizhen Wang, Zhou Yang, Jianpeng Hou, Lijuan Chen, Yukun Zheng, Niuguo Cheng, Dai Zhuyun, Aneesh Joshi, Zeno Gantner, Kai Huang, stanpcf, ChangQF, Mike Kellogg

Project Organizers

Jiafeng Guo
- Institute of Computing Technology, Chinese Academy of Sciences
- Homepage
Yanyan Lan
- Institute of Computing Technology, Chinese Academy of Sciences
- Homepage
Xueqi Cheng
- Institute of Computing Technology, Chinese Academy of Sciences
- Homepage

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
License
- OSI Approved :: Apache Software License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

1.0

Aug 21, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MatchZoo-test-1.0.tar.gz (84.6 kB view details)

Uploaded Aug 21, 2019 Source

File details

Details for the file MatchZoo-test-1.0.tar.gz.

File metadata

Download URL: MatchZoo-test-1.0.tar.gz
Upload date: Aug 21, 2019
Size: 84.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for MatchZoo-test-1.0.tar.gz
Algorithm	Hash digest
SHA256	`66159089aee78a3f939663398dec5e8839c833899357c88f091aa100780d3192`
MD5	`c7fd8365a3d5f7a36a2d096b0b74a99e`
BLAKE2b-256	`5506fde0d67785f2dfd66f32254b746a9e9b67b290e5d6454aa6474237d5ec48`

See more details on using hashes here.

MatchZoo-test 1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MatchZoo-py

Get Started in 60 Seconds

References

Install

Models

Citation

Development Team

Contribution

Project Organizers

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes