Skip to main content

Semantic search to query covid related papers

Project description

Semantic search with FAISS

The idea of this project is to build a semantic search engine which can search across multiple research papers related to covid and return the response. This can pretty much help people who want to know about ongoing research with respect to covid'19

We have used - retrieval-ranking method with faiss index for retrieving data for the query.

Installation

pip install semantic-search-faiss

Inference example

from semanticsearch import search,utils,config
from semanticsearch.pretrained import get_model
from sentence_transformers import CrossEncoder

bi_encoder,index,documents=get_model(config.BI_ENCODER,config.INDEX,config.DATA)
cross_encoder = CrossEncoder(config.CROSS_ENCODER)

query='death rates of covid'
results=search.search(query,index,bi_encoder,cross_encoder,documents)
Training pipeline

Synthetic query generation using T5
Finetuning Bi-encoder using the synthetic query
Index the data using finetuned BI-encoder
Bi-encoder + Cross encoder with FAISS search

Try out the code either on google colab.

Open In Colab

Kaggle

Detailed walk through of the solution can be found in the below kaggle notebook

Kaggle

Acknowledgements

We would like to thank Kaggle community as a whole for providing an avenue to learn and discuss latest data science/machine learning advancements but a hat tip to whose code was used / who inspired us.

  1. Vladimir Iglovikov for his wonderful article "I trained a model. What is next?"

  2. Xhululu for the dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_search_faiss-0.0.9.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

semantic_search_faiss-0.0.9-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file semantic_search_faiss-0.0.9.tar.gz.

File metadata

  • Download URL: semantic_search_faiss-0.0.9.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.3

File hashes

Hashes for semantic_search_faiss-0.0.9.tar.gz
Algorithm Hash digest
SHA256 95991b9ae09e4dd55c3331cb9aefad1954247a2e4f6e557f90af25b3ecf6c00a
MD5 d4d92e1d1f07f6fe3e120dcb43bedcf6
BLAKE2b-256 7b426d738ff0ad2f1be3a76025f52cee98bf6a7b81749a9cbcae5b04c9db47fa

See more details on using hashes here.

File details

Details for the file semantic_search_faiss-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: semantic_search_faiss-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.3

File hashes

Hashes for semantic_search_faiss-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 87819aeca4b67d164163fb426101818facbc447a286a37f1226e12775ce7cb46
MD5 6e287fa23c7dc9fa59e93549ce8aea6e
BLAKE2b-256 1da06a43868e62f6b63a57d1c36e328aed4706ea3c631c6bf655ad09ef6ae2c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page