getpaper - papers download made easy!
Project description
getpaper
Paper downloader
getting started
Install the library with:
pip install getpaper
Usage
Downloading papers
After the installation you can either import the library into your python code or you can use the console scripts, for example:
download download download_pubmed --pubmed 22266545 --folder papers --name pmid
Downloads the paper with pubmed id into the folder 'papers' and uses the pubmed id as name
download download download_doi --doi 10.1519/JSC.0b013e318225bbae --folder papers
Downloads the paper with DOI into the folder papers, as --name is not specified doi is used as name
Parsing the papers
You can parse the downloaded papers with the unstructure library. For example if the papers are in the folder test, you can run:
getpaper/parse.py parse_folder --folder /home/antonkulaga/sources/getpaper/test
You can also parse papers on a per file basis, for example:
getpaper/parse.py parse_paper --paper /home/antonkulaga/sources/getpaper/test/22266545.pdf
Indexing papers
We also provide features to index the papers with openai or lambda embeddings and save them in chromadb vector store. For openai embeddings to work you have to create .env file and specify your openai key there, see .env.template as example
Examples
You can run examples.py to see usage examples
Additional requirements
Detectron2 is required for using models from the layoutparser model zoo but is not automatically installed with this package. For MacOS and Linux, build from source with:
pip install 'git+https://github.com/facebookresearch/detectron2.git@e2ce8dc#egg=detectron2'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file getpaper-0.0.6.tar.gz
.
File metadata
- Download URL: getpaper-0.0.6.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 752ea8268db0aa5608223d5ed9550795302e2fde0f1d22e4e442e120f6f20508 |
|
MD5 | 507d25f9febc69ac8d7a970d4de6603f |
|
BLAKE2b-256 | 6619a8b54fc328ca922c5b8631323223dd867c66cdfa3652670e7889e4d2009a |
File details
Details for the file getpaper-0.0.6-py2.py3-none-any.whl
.
File metadata
- Download URL: getpaper-0.0.6-py2.py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2d1f17fdbe2069ad927ad4e90a1287c9bdcfd19a1bc9bf6c97a0859a6ac0e6a |
|
MD5 | 8aaa41cdfe5eff3c3e314e14734ccacb |
|
BLAKE2b-256 | 39c7072614af1c042b30420751ab21414e2eec2ca8b78142608cfb56ab29eade |