An industrial-grade implementation of DSSM
Project description
DSSM
An industrial-grade implementation of the paper: Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. DSSM project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them.
This model can be used as a search engine that helps people find out their desired document even with searching a query that:
- is abbreviation of the document words;
- changed the order of the words in the document;
- shortened words in the document;
- has typos;
- has spacing issues.
Install
DSSM is dependent on PyTorch. Two ways to install DSSM:
Install DSSM from Pypi:
pip install dssm
Install DSSM from the Github source:
git clone https://github.com/Chiang97912/dssm.git
cd dssm
python setup.py install
Usage
Train
from dssm.model import DSSM
queries = ['...'] # query list, words need to be segmented in advance, and tokens should be spliced with spaces.
documents = ['...'] # document list, words need to be segmented in advance, and tokens should be spliced with spaces.
model = DSSM('dssm-model', device='cuda:0', lang='en')
model.fit(queries, documents)
Test
from dssm.model import DSSM
from sklearn.metrics.pairwise import cosine_similarity
text_left = '...'
text_right = '...'
model = DSSM('dssm-model', device='cpu')
vectors = model.encode([text_left, text_right])
score = cosine_similarity([vectors[0]], [vectors[1]])
print(score)
Dependencies
Python
version 3.6Numpy
version 1.19.5PyTorch
version 1.9.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dssm-0.1.3.tar.gz
.
File metadata
- Download URL: dssm-0.1.3.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68c8d6fffa3390414bbf8a7f3f156cd45609f509dde5091360d976d74bc851f9 |
|
MD5 | a01e2999f515967d11c34a47d4cfc593 |
|
BLAKE2b-256 | 40c4f73359cc1f4d9f08a15d80cb5a41c50607a1ee37738028ea1fd283f8d673 |
File details
Details for the file dssm-0.1.3-py2.py3-none-any.whl
.
File metadata
- Download URL: dssm-0.1.3-py2.py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 177022a75ef3e7a3409701b21afca395dc45c57fa5b7f27d79b8bce33e21b930 |
|
MD5 | c9894ba9bd6dfa83b07f425b6e2bd305 |
|
BLAKE2b-256 | c6114d1e6fb0ad15f45bf9ffcb1a30c639473e68d10b70d427afba6a433802b6 |