Skip to main content

getpaper - papers download made easy!

Project description

getpaper

Paper downloader

getting started

Install the library with:

pip install getpaper

Usage

Downloading papers

After the installation you can either import the library into your python code or you can use the console scripts, for example:

download download download_pubmed --pubmed 22266545 --folder papers --name pmid

Downloads the paper with pubmed id into the folder 'papers' and uses the pubmed id as name

download download download_doi --doi 10.1519/JSC.0b013e318225bbae --folder papers

Downloads the paper with DOI into the folder papers, as --name is not specified doi is used as name

Parsing the papers

You can parse the downloaded papers with the unstructure library. For example if the papers are in the folder test, you can run:

getpaper/parse.py parse_folder --folder /home/antonkulaga/sources/getpaper/test

You can also parse papers on a per file basis, for example:

getpaper/parse.py parse_paper --paper /home/antonkulaga/sources/getpaper/test/22266545.pdf

Indexing papers

We also provide features to index the papers with openai or lambda embeddings and save them in chromadb vector store. For openai embeddings to work you have to create .env file and specify your openai key there, see .env.template as example

Examples

You can run examples.py to see usage examples

Additional requirements

Detectron2 is required for using models from the layoutparser model zoo but is not automatically installed with this package. For MacOS and Linux, build from source with:

pip install 'git+https://github.com/facebookresearch/detectron2.git@e2ce8dc#egg=detectron2'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getpaper-0.0.6.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

getpaper-0.0.6-py2.py3-none-any.whl (8.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file getpaper-0.0.6.tar.gz.

File metadata

  • Download URL: getpaper-0.0.6.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for getpaper-0.0.6.tar.gz
Algorithm Hash digest
SHA256 752ea8268db0aa5608223d5ed9550795302e2fde0f1d22e4e442e120f6f20508
MD5 507d25f9febc69ac8d7a970d4de6603f
BLAKE2b-256 6619a8b54fc328ca922c5b8631323223dd867c66cdfa3652670e7889e4d2009a

See more details on using hashes here.

File details

Details for the file getpaper-0.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: getpaper-0.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for getpaper-0.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b2d1f17fdbe2069ad927ad4e90a1287c9bdcfd19a1bc9bf6c97a0859a6ac0e6a
MD5 8aaa41cdfe5eff3c3e314e14734ccacb
BLAKE2b-256 39c7072614af1c042b30420751ab21414e2eec2ca8b78142608cfb56ab29eade

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page