Skip to main content

A python library/command-line tool to retrieve the DOI of a paper from a pdf file.

Project description

pdf2doi

pdf2doi is a Python library to extract the DOI or other identifiers (e.g. arXiv) from a pdf file of a publication.

Installation

Use the package manager [pip] to install pdf2doi.

pip install pdf2doi

Usage

pdf2doi can be used either as a stand-alone application invoked from the command line, or by importing it in your python project.

Example of usage from command line:

pdf2doi 'path/filename.pdf'
pdf2doi './folder'
pdf2doi --h

usage: pdf2doi [-h] [-v] [-nws] [-nwv] [-google_results GOOGLE_RESULTS] filename

Retrieve the DOI of a paper from a PDF file.

positional arguments:
  filename              Relative path of the pdf file or of a folder.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase output verbosity.
  -nws, --nowebsearch   Disable any DOI retrieval method which requires internet searches (e.g. queries to google).
  -nwv, --nowebvalidation
                        Disable the DOI online validation via queries (e.g., to http://dx.doi.org/).
  -google_results GOOGLE_RESULTS
                        Set how many results should be considered when doing a google search for the DOI (default=6).

Example of usage inside a python script:

import pdf2doi
#Try to identify the DOI/identifier of the file 'path/filename.pdf'
result = pdf2doi.pdf2doi('path/filename.pdf',verbose=True)
#the output is a list with three strings
#result = [identifier, type_identifier, file_name]

#Try to identify the DOIs of all pdf files contained in the folder
result  = pdf2doi.pdf2doi('./folder',verbose=True,webvalidation=True) 
#The output is a list containing an element for each .pdf file in the folder,
#and each element has the format [identifier, type_identifier, file_name]

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2doi-0.1.4a1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2doi-0.1.4a1-py2.py3-none-any.whl (10.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pdf2doi-0.1.4a1.tar.gz.

File metadata

  • Download URL: pdf2doi-0.1.4a1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for pdf2doi-0.1.4a1.tar.gz
Algorithm Hash digest
SHA256 af2d48c8db4eaf4ad5c7a86481a048e8c70e9014e00dc9c943d47b7bbb3c067b
MD5 74c6f0a8425df79d7ac317263e71c43d
BLAKE2b-256 1105950497cbdff05f2ca29dea992e4e065487d8f908e32508fc8ea3b54f49b4

See more details on using hashes here.

File details

Details for the file pdf2doi-0.1.4a1-py2.py3-none-any.whl.

File metadata

  • Download URL: pdf2doi-0.1.4a1-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for pdf2doi-0.1.4a1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 05a5e4a33de24fc2190249cdbbb4225b3ba1f7f893e98c09f09414dd3a17be57
MD5 143aa87c731d79b5c4350f4feaa888f7
BLAKE2b-256 09239fff65b24365fb0d0d3096d37e1447b68052433f58d7e4845d58537d8c62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page