Skip to main content

An industrial-grade implementation of DSSM

Project description

DSSM

An industrial-grade implementation of the paper: Learning Deep Structured Semantic Models for Web Search using Clickthrough Data

Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. DSSM project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them.

This model can be used as a search engine that helps people find out their desired document even with searching a query that:

  1. is abbreviation of the document words;
  2. changed the order of the words in the document;
  3. shortened words in the document;
  4. has typos;
  5. has spacing issues.

Install

DSSM is dependent on PyTorch. Two ways to install DSSM:

Install DSSM from Pypi:

pip install dssm

Install DSSM from the Github source:

git clone https://github.com/Chiang97912/dssm.git
cd dssm
python setup.py install

Usage

Train

from dssm.model import DSSM

queries = ['...']  # query list, words need to be segmented in advance, and tokens should be spliced with spaces.
documents = ['...']  # document list, words need to be segmented in advance, and tokens should be spliced with spaces.
model = DSSM('dssm-model', device='cuda:0', lang='en')
model.fit(queries, documents)

Test

from dssm.model import DSSM
from sklearn.metrics.pairwise import cosine_similarity

text_left = '...'
text_right = '...'
model = DSSM('dssm-model', device='cpu')
vectors = model.encode([text_left, text_right])
score = cosine_similarity([vectors[0]], [vectors[1]])
print(score)

Dependencies

  • Python version 3.6
  • Numpy version 1.19.5
  • PyTorch version 1.9.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dssm-0.1.3.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

dssm-0.1.3-py2.py3-none-any.whl (8.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dssm-0.1.3.tar.gz.

File metadata

  • Download URL: dssm-0.1.3.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8

File hashes

Hashes for dssm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 68c8d6fffa3390414bbf8a7f3f156cd45609f509dde5091360d976d74bc851f9
MD5 a01e2999f515967d11c34a47d4cfc593
BLAKE2b-256 40c4f73359cc1f4d9f08a15d80cb5a41c50607a1ee37738028ea1fd283f8d673

See more details on using hashes here.

File details

Details for the file dssm-0.1.3-py2.py3-none-any.whl.

File metadata

  • Download URL: dssm-0.1.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8

File hashes

Hashes for dssm-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 177022a75ef3e7a3409701b21afca395dc45c57fa5b7f27d79b8bce33e21b930
MD5 c9894ba9bd6dfa83b07f425b6e2bd305
BLAKE2b-256 c6114d1e6fb0ad15f45bf9ffcb1a30c639473e68d10b70d427afba6a433802b6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page