Project description

Why use CordSearch

In response to the Covid-19 pandemic, research efforts and subsequent publications were dramatically accelerated. The breakneck pace has made it nearly impossible for the medical community to verify and reference the results of the many papers published daily. CordSearch can make this process more efficient by providing easy to use semantic search functions. For any paper in the CORD-19 dataset, CordSearch can be used to quickly find similar papers or sentences. If a paper has a promising conclusion, using CordSearch can help researchers identify whether or not the result is supported by the broader literature.

Package setup

Create a virtual environment and then pip install -r requirements.txt.

Download the data

The Cord-19 dataset can be downloaded through the Huggingface Dataset Hub:

from datasets import load_dataset

ds = load_dataset('cord19', 'fulltext')

The dataset is approximately 9gb

Download punkt for NLTK

import nltk

nltk.download('punkt')

Using CordSearch

from cordsearch import CordDataset

ds = CordDataset()

# Get abstracts by ID:
abstract = ds.abstracts[10]

# Get individual sentences by specifying the abstract and sentence IDs
sentence = ds.sentence(abstract_id=10, sentence_id=5)

# Find similar papers by specifying the abstract of interest and the number of most similar papers to be displayed
ds.find_similar_abstracts(abstract_id=10, top_k=2)

# Find similar papers by specifying the fulltext of interest
ds.find_similar_papers(bodytext_id=10, top_k=2)

# Find similar sentences from CORD-19 abstracts
ds.quick_find_similar_sentences(abstract_id=10, sentence_id=5, top_k=2)

# Find similar sentences from CORD-19 fulltexts
ds.find_similar_sentences(bodytext_id=10, sentence_id=5, top_k=2)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.1

Jul 29, 2022

0.2 yanked

Jul 29, 2022

Reason this release was yanked:

limited search range

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cordsearch-0.2.1.tar.gz (6.6 kB view hashes)

Uploaded Jul 29, 2022 Source

Hashes for cordsearch-0.2.1.tar.gz

Hashes for cordsearch-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`e4c4e43dccbe0163109f25dd464a476c2125cf017c2e2da4e87bc21b6d45d53b`
MD5	`83f5af7b97f17da55acd251e025add14`
BLAKE2b-256	`3a1b6f232a55adfa839faf3896b13ba8e7f6190844106376ee5cad008472adce`