Skip to main content

Fulltext-like search using NLP concept

Project description

Fulltext-like search using NLP concept

Python 3.6, 3.7

Library for fulltext search using NLP concept. Use deeppavlov for paraphrase identification and Vantage-Point tree (based on jvptree) for fast search.

Installation

Install and update using pip:

pip install -U nlp-text-search

Usage

First init data, create deeppavlov settings and Doc2Vec for emdedding.

import deeppavlov
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.utils import simple_preprocess
from nlp_text_search import create_settings, LinearizedDist, SaveableVPTreeSearchEngine

paraphrases = [
    (('красная ручка', 'синяя ручка'), 1),
    (('красная ручка', 'зеленая ручка'), 1),
    (('красная машина', 'синяя машина'), 1),
    (('красная машина', 'зеленая машина'), 1),
    (('синяя ручка', 'красная ручка'), 1),
    (('синяя ручка', 'зеленая ручка'), 1),
    (('синяя машина', 'красная машина'), 1),
    (('синяя машина', 'зеленая машина'), 1),
    (('красная ручка', 'красная машина'), 0),
    (('красная ручка', 'синяя машина'), 0),
    (('красная ручка', 'зеленая машина'), 0),
    (('синяя ручка', 'красная машина'), 0),
    (('синяя ручка', 'синяя машина'), 0),
    (('синяя ручка', 'зеленая машина'), 0)
]
all_texts = list(set([t[0][0] for t in paraphrases] + [t[0][1] for t in paraphrases]))

settings = create_settings(paraphrases, 'test')
deeppavlov.train_model(settings)
doc2vec = Doc2Vec([TaggedDocument(simple_preprocess(t), [i]) for i, t in enumerate(all_texts)],
                  min_count=1, workers=1, negative=0, dm=0, hs=1)

Then create search engine and search nearest neighbors

se = DefaultSearchEngine(settings, doc2vec, LinearizedDist, points=all_texts)
print(se.search('красная ручка', 4))

returns

[('красная ручка', 0), ('зеленая ручка', 0.05778998136520386), ('синяя ручка', 0.06721997261047363), ('синяя машина', 0.48162001371383667)]

You also can save and load search engine

se.save('se')
se = DefaultSearchEngine.load('se')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp-text-search-0.6.16.tar.gz (21.6 kB view details)

Uploaded Source

File details

Details for the file nlp-text-search-0.6.16.tar.gz.

File metadata

  • Download URL: nlp-text-search-0.6.16.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.5

File hashes

Hashes for nlp-text-search-0.6.16.tar.gz
Algorithm Hash digest
SHA256 024fe75a705645ada57e83f3ce53226ea299ae9dc31eba448c0b8e0bfb20f6e6
MD5 79edb3eb806072b0be547a148d51727e
BLAKE2b-256 78474f4de1aaf99895cc71219ee3d1da8b6754b6d304f8861898d7ddfdc37ced

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page