Fulltext-like search using NLP concept
Project description
Fulltext-like search using NLP concept
Library for fulltext search using NLP concept. Use deeppavlov for paraphrase identification and Vantage-Point tree (based on jvptree) for fast search.
Installation
Install and update using pip:
pip install -U nlp-text-search
Usage
First init data, create deeppavlov settings and Doc2Vec for emdedding.
import deeppavlov
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.utils import simple_preprocess
from nlp_text_search import create_settings, LinearizedDist, SaveableVPTreeSearchEngine
paraphrases = [
(('красная ручка', 'синяя ручка'), 1),
(('красная ручка', 'зеленая ручка'), 1),
(('красная машина', 'синяя машина'), 1),
(('красная машина', 'зеленая машина'), 1),
(('синяя ручка', 'красная ручка'), 1),
(('синяя ручка', 'зеленая ручка'), 1),
(('синяя машина', 'красная машина'), 1),
(('синяя машина', 'зеленая машина'), 1),
(('красная ручка', 'красная машина'), 0),
(('красная ручка', 'синяя машина'), 0),
(('красная ручка', 'зеленая машина'), 0),
(('синяя ручка', 'красная машина'), 0),
(('синяя ручка', 'синяя машина'), 0),
(('синяя ручка', 'зеленая машина'), 0)
]
all_texts = list(set([t[0][0] for t in paraphrases] + [t[0][1] for t in paraphrases]))
settings = create_settings(paraphrases, 'test')
deeppavlov.train_model(settings)
doc2vec = Doc2Vec([TaggedDocument(simple_preprocess(t), [i]) for i, t in enumerate(all_texts)],
min_count=1, workers=1, negative=0, dm=0, hs=1)
Then create search engine and search nearest neighbors
se = DefaultSearchEngine(settings, doc2vec, LinearizedDist, points=all_texts)
print(se.search('красная ручка', 4))
returns
[('красная ручка', 0), ('зеленая ручка', 0.05778998136520386), ('синяя ручка', 0.06721997261047363), ('синяя машина', 0.48162001371383667)]
You also can save and load search engine
se.save('se')
se = DefaultSearchEngine.load('se')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp-text-search-0.6.16.tar.gz
(21.6 kB
view details)
File details
Details for the file nlp-text-search-0.6.16.tar.gz
.
File metadata
- Download URL: nlp-text-search-0.6.16.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 024fe75a705645ada57e83f3ce53226ea299ae9dc31eba448c0b8e0bfb20f6e6 |
|
MD5 | 79edb3eb806072b0be547a148d51727e |
|
BLAKE2b-256 | 78474f4de1aaf99895cc71219ee3d1da8b6754b6d304f8861898d7ddfdc37ced |