Skip to main content

Fulltext-like search using NLP concept

Project description

Fulltext-like search using NLP concept

Python 3.6, 3.7

Library for fulltext search using NLP concept. Use deeppavlov for paraphrase identification and Vantage-Point tree (based on jvptree) for fast search.

Installation

Install and update using pip:

pip install -U nlp-text-search

Usage

First init data, create deeppavlov settings and Doc2Vec for emdedding.

import deeppavlov
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.utils import simple_preprocess
from nlp_text_search import create_settings, LinearizedDist, SaveableVPTreeSearchEngine

paraphrases = [
    (('красная ручка', 'синяя ручка'), 1),
    (('красная ручка', 'зеленая ручка'), 1),
    (('красная машина', 'синяя машина'), 1),
    (('красная машина', 'зеленая машина'), 1),
    (('синяя ручка', 'красная ручка'), 1),
    (('синяя ручка', 'зеленая ручка'), 1),
    (('синяя машина', 'красная машина'), 1),
    (('синяя машина', 'зеленая машина'), 1),
    (('красная ручка', 'красная машина'), 0),
    (('красная ручка', 'синяя машина'), 0),
    (('красная ручка', 'зеленая машина'), 0),
    (('синяя ручка', 'красная машина'), 0),
    (('синяя ручка', 'синяя машина'), 0),
    (('синяя ручка', 'зеленая машина'), 0)
]
all_texts = list(set([t[0][0] for t in paraphrases] + [t[0][1] for t in paraphrases]))

settings = create_settings(paraphrases, 'test')
deeppavlov.train_model(settings)
doc2vec = Doc2Vec([TaggedDocument(simple_preprocess(t), [i]) for i, t in enumerate(all_texts)],
                  min_count=1, workers=1, negative=0, dm=0, hs=1)

Then create search engine and search nearest neighbors

se = DefaultSearchEngine(settings, doc2vec, LinearizedDist, points=all_texts)
print(se.search('красная ручка', 4))

returns

[('красная ручка', 0), ('зеленая ручка', 0.05778998136520386), ('синяя ручка', 0.06721997261047363), ('синяя машина', 0.48162001371383667)]

You also can save and load search engine

se.save('se')
se = DefaultSearchEngine.load('se')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp-text-search-0.6.8.tar.gz (15.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page