Skip to main content

Semantic search to query covid related papers

Project description

Semantic search with FAISS

The idea of this project is to build a semantic search engine which can search across multiple research papers related to covid and return the response. This can pretty much help people who want to know about ongoing research with respect to covid'19

I have used - retrieval-ranking method with faiss index approach for the faster retrieval of data for the given query.

Installation

pip install semantic-search-faiss

Inference example

from semanticsearch import search,utils,config
from semanticsearch.pretrained import get_model
from sentence_transformers import CrossEncoder

bi_encoder,index,documents=get_model(config.BI_ENCODER,config.INDEX,config.DATA)
cross_encoder = CrossEncoder(config.CROSS_ENCODER)

query='death rates of covid'
results=search.search(query,index,bi_encoder,cross_encoder,documents)

Training pipeline


1. Synthetic query generation using T5
2. Finetuning Bi-encoder using the synthetic query
3. Indexing the data with FAISS using finetuned BI-encoder
4. Bi-encoder + Cross encoder with FAISS search

Try out the code on google colab.

Open In Colab

Kaggle

Detailed walk through of the solution can be found in the below kaggle notebook

Kaggle

Acknowledgements

I would like to thank Kaggle community as a whole for providing an avenue to learn and discuss latest data science/machine learning advancements.

  1. Vladimir Iglovikov for his wonderful article "I trained a model. What is next?"

  2. Xhululu for the dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_search_faiss-0.1.0.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

semantic_search_faiss-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file semantic_search_faiss-0.1.0.tar.gz.

File metadata

  • Download URL: semantic_search_faiss-0.1.0.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.3

File hashes

Hashes for semantic_search_faiss-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0771cd3bd12f013a019561e3a10de6ef3fa61f9f95fb03cd751e23cec2ca2140
MD5 57de1684022a2d0aa23a61fb0e74f053
BLAKE2b-256 f084f2b47c148a7cacf43a2029b45da9025e3ceeca8d9575b0df385b33241c74

See more details on using hashes here.

File details

Details for the file semantic_search_faiss-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: semantic_search_faiss-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.3

File hashes

Hashes for semantic_search_faiss-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f81ef686f20edccbed78139499a1d50b58035c77d8d8a784d27a9f87a0d5fb4
MD5 3a9a17330b39920b3336ec6aa6cf31d7
BLAKE2b-256 59b38c8acb27aa2e23107c39aca9df32aa55df8031ea5ac9f6b6cb906b066de4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page