Skip to main content

An industrial-grade implementation of DSSM

Project description

DSSM

An industrial-grade implementation of the paper: Learning Deep Structured Semantic Models for Web Search using Clickthrough Data

Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. DSSM project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them.

This model can be used as a search engine that helps people find out their desired document even with searching a query that:

  1. is abbreviation of the document words;
  2. changed the order of the words in the document;
  3. shortened words in the document;
  4. has typos;
  5. has spacing issues.

Install

DSSM is dependent on PyTorch. Two ways to install DSSM:

Install DSSM from Pypi:

pip install dssm

Install DSSM from the Github source:

git clone https://github.com/Chiang97912/dssm.git
cd dssm
python setup.py install

Usage

Train

from dssm.model import DSSM

queries = ['...']  # query list, words need to be segmented in advance, and tokens should be spliced with spaces.
documents = ['...']  # document list, words need to be segmented in advance, and tokens should be spliced with spaces.
model = DSSM('dssm-model', device='cuda:0', lang='en')
model.fit(queries, documents)

Test

from dssm.model import DSSM
from sklearn.metrics.pairwise import cosine_similarity

text_left = '...'
text_right = '...'
model = DSSM('dssm-model', device='cpu')
vectors = model.encode([text_left, text_right])
score = cosine_similarity([vectors[0]], [vectors[1]])
print(score)

Dependencies

  • Python version 3.6
  • Numpy version 1.19.5
  • PyTorch version 1.9.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dssm-0.1.3.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

dssm-0.1.3-py2.py3-none-any.whl (8.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page