Skip to main content

Semantic search to query covid related papers

Project description

Semantic search with FAISS

The idea of this project is to build a semantic search engine which can search across multiple research papers related to covid and return the response. This can pretty much help people who want to know about ongoing research with respect to covid'19

I have used - retrieval-ranking method with faiss index approach for the faster retrieval of data for the given query.


pip install semantic-search-faiss

Inference example

from semanticsearch import search,utils,config
from semanticsearch.pretrained import get_model
from sentence_transformers import CrossEncoder

cross_encoder = CrossEncoder(config.CROSS_ENCODER)

query='death rates of covid',index,bi_encoder,cross_encoder,documents)

Training pipeline

1. Synthetic query generation using T5
2. Finetuning Bi-encoder using the synthetic query
3. Indexing the data with FAISS using finetuned BI-encoder
4. Bi-encoder + Cross encoder with FAISS search

Try out the code on google colab.

Open In Colab


Detailed walk through of the solution can be found in the below kaggle notebook



I would like to thank Kaggle community as a whole for providing an avenue to learn and discuss latest data science/machine learning advancements.

  1. Vladimir Iglovikov for his wonderful article "I trained a model. What is next?"

  2. Xhululu for the dataset.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_search_faiss-0.1.0.tar.gz (6.0 kB view hashes)

Uploaded source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page