Skip to main content

AI-powered literature discovery and review engine for medical/scientific papers

Project description

paperai: AI-powered literature discovery and review engine for medical/scientific papers

paperai builds an AI-powered index over sets of medical and scientific papers.

Installation

The easiest way to install is via pip and PyPI

pip install paperai

You can also install paperai directly from GitHub. Using a Python Virtual Environment is recommended.

pip install git+https://github.com/neuml/paperai

Python 3.6+ is supported

If running on Windows, check out this link for possible install issues.

Building a model

paperai indexes models previously built with paperetl. paperai currently supports querying SQLite databases.

To build an index for a SQLite articles database:

# Can optionally use pre-trained vectors
# https://www.kaggle.com/davidmezzetti/cord19-fasttext-vectors#cord19-300d.magnitude
# Default location: ~/.cord19/vectors/cord19-300d.magnitude
python -m paperai.vectors

# Build embeddings index
python -m paperai.index

The model will be stored in ~/.cord19

Building a report file

A report file is simply a markdown file created from a list of queries. An example report call:

python -m paperai.report tasks/risk-factors.yml

Once complete a file named tasks/risk-factors.md will be created.

Running queries

The fastest way to run queries is to start a paperai shell

paperai

A prompt will come up. Queries can be typed directly into the console.

Tech Overview

The tech stack is built on Python and creates a sentence embeddings index with FastText + BM25. Background on this method can be found in this Medium article and an existing repository using this method codequestion.

The model is a combination of the sentence embeddings index and a SQLite database with the articles. Each article is parsed into sentences and stored in SQLite along with the article metadata. FastText vectors are built over the full corpus. The sentence embeddings index only uses tagged articles, which helps produce most relevant results.

Multiple entry points exist to interact with the model.

  • paperai.report - Builds a markdown report for a series of queries. For each query, the best articles are shown, top matches from those articles and a highlights section which shows the most relevant sections from the embeddings search for the query.
  • paperai.query - Runs a single query from the terminal
  • paperai.shell - Allows running multiple queries from the terminal

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperai-1.2.1.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

paperai-1.2.1-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file paperai-1.2.1.tar.gz.

File metadata

  • Download URL: paperai-1.2.1.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.8

File hashes

Hashes for paperai-1.2.1.tar.gz
Algorithm Hash digest
SHA256 737865c0e1269d949abb79d0bcd1fadb2784a24bf27bde62067f7ac12f2c4fab
MD5 572fdccce0994941025f02146a5d32e1
BLAKE2b-256 656dece5e2ff4fec21ceba867caed927428e19ae23e1eb2dfd584e20dc1dbe1b

See more details on using hashes here.

File details

Details for the file paperai-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: paperai-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.8

File hashes

Hashes for paperai-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b2c0f7fa7094bb79ef50ef1e40f4443d2b5a56e5bfd148049abaab5cb86062b
MD5 1dce59d63200fca4a7c920caf28389de
BLAKE2b-256 9b3054e8ef8c94ae07c01d51f1a04dc63e5903753b65ff3314d3fe4e2d83daf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page