Semantic search to query covid related papers
Project description
Semantic search with FAISS
The idea of this project is to build a semantic search engine which can search across multiple research papers related to covid and return the response. This can pretty much help people who want to know about ongoing research with respect to covid'19
I have used - retrieval-ranking method with faiss index
approach for the faster retrieval of data for the given query.
Installation
pip install semantic-search-faiss
Inference example
from semanticsearch import search,utils,config
from semanticsearch.pretrained import get_model
from sentence_transformers import CrossEncoder
bi_encoder,index,documents=get_model(config.BI_ENCODER,config.INDEX,config.DATA)
cross_encoder = CrossEncoder(config.CROSS_ENCODER)
query='death rates of covid'
results=search.search(query,index,bi_encoder,cross_encoder,documents)
Training pipeline
1. Synthetic query generation using T5
2. Finetuning Bi-encoder using the synthetic query
3. Indexing the data with FAISS using finetuned BI-encoder
4. Bi-encoder + Cross encoder with FAISS search
Try out the code on google colab.
Kaggle
Detailed walk through of the solution can be found in the below kaggle notebook
Acknowledgements
I would like to thank Kaggle community as a whole for providing an avenue to learn and discuss latest data science/machine learning advancements.
-
Vladimir Iglovikov for his wonderful article "I trained a model. What is next?"
-
Xhululu for the dataset.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file semantic_search_faiss-0.1.0.tar.gz
.
File metadata
- Download URL: semantic_search_faiss-0.1.0.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0771cd3bd12f013a019561e3a10de6ef3fa61f9f95fb03cd751e23cec2ca2140 |
|
MD5 | 57de1684022a2d0aa23a61fb0e74f053 |
|
BLAKE2b-256 | f084f2b47c148a7cacf43a2029b45da9025e3ceeca8d9575b0df385b33241c74 |
File details
Details for the file semantic_search_faiss-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: semantic_search_faiss-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f81ef686f20edccbed78139499a1d50b58035c77d8d8a784d27a9f87a0d5fb4 |
|
MD5 | 3a9a17330b39920b3336ec6aa6cf31d7 |
|
BLAKE2b-256 | 59b38c8acb27aa2e23107c39aca9df32aa55df8031ea5ac9f6b6cb906b066de4 |