A python library/command-line tool to retrieve the DOI of a paper from a pdf file.
Project description
pdf2doi
pdf2doi is a Python library to extract the DOI or other identifiers (e.g. arXiv) from a pdf file of a publication.
Installation
Use the package manager [pip] to install pdf2doi.
pip install pdf2doi
Usage
pdf2doi can be used either as a stand-alone application invoked from the command line, or by importing it in your python project.
Example of usage from command line:
pdf2doi 'path/filename.pdf'
pdf2doi './folder'
pdf2doi --h
usage: pdf2doi [-h] [-v] [-nws] [-nwv] [-google_results GOOGLE_RESULTS] filename
Retrieve the DOI of a paper from a PDF file.
positional arguments:
filename Relative path of the pdf file or of a folder.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Increase output verbosity.
-nws, --nowebsearch Disable any DOI retrieval method which requires internet searches (e.g. queries to google).
-nwv, --nowebvalidation
Disable the DOI online validation via queries (e.g., to http://dx.doi.org/).
-google_results GOOGLE_RESULTS
Set how many results should be considered when doing a google search for the DOI (default=6).
Example of usage inside a python script:
import pdf2doi
#Try to identify the DOI/identifier of the file 'path/filename.pdf'
result = pdf2doi.pdf2doi('path/filename.pdf',verbose=True)
#the output is a list with three strings
#result = [identifier, type_identifier, file_name]
#Try to identify the DOIs of all pdf files contained in the folder
result = pdf2doi.pdf2doi('./folder',verbose=True,webvalidation=True)
#The output is a list containing an element for each .pdf file in the folder,
#and each element has the format [identifier, type_identifier, file_name]
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2doi-0.1.4a1.tar.gz.
File metadata
- Download URL: pdf2doi-0.1.4a1.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af2d48c8db4eaf4ad5c7a86481a048e8c70e9014e00dc9c943d47b7bbb3c067b
|
|
| MD5 |
74c6f0a8425df79d7ac317263e71c43d
|
|
| BLAKE2b-256 |
1105950497cbdff05f2ca29dea992e4e065487d8f908e32508fc8ea3b54f49b4
|
File details
Details for the file pdf2doi-0.1.4a1-py2.py3-none-any.whl.
File metadata
- Download URL: pdf2doi-0.1.4a1-py2.py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05a5e4a33de24fc2190249cdbbb4225b3ba1f7f893e98c09f09414dd3a17be57
|
|
| MD5 |
143aa87c731d79b5c4350f4feaa888f7
|
|
| BLAKE2b-256 |
09239fff65b24365fb0d0d3096d37e1447b68052433f58d7e4845d58537d8c62
|