Skip to main content

DOIdownloader: You give it DOIs, it gives you the article PDFs.

Project description

DOIdownloader

📝⬇️ DOIdownloader: You give it DOIs, it gives you the article PDFs.

It is surprisingly tricky to reliably obtain the full PDF of a scientific publication given its DOI. This Python package aims to do just that: you give it a list of DOIs, and it will download the full-text PDFs (or other formats if no PDF is available), taking care of much of the complexity. It ensures that lookups to different domains can happen asynchronously (i.e., one slow website won't stall all your other downloads).

DOIdownloader gives precedence to the publisher-formatted version (the so-called ‘Version of Record’), and will try downloading an open access pre- or postprint if you cannot access the publisher version. Importantly, DOIdownloader only tries downloading through routes that are widely considered to be legal. In more concrete terms, we do not download from Sci-Hub or similar platforms.

Installation

The package can be installed with pip:

pip install git+https://github.com/rafguns/doidownloader.git

Basic usage: command-line

The easiest way to get started is from the command-line. If you have a plain-text file of DOIs named dois.txt, you can download their PDFs as follows:

python -m doidownloader

This will download the results to a SQLite database named doi-fulltexts.db in the same directory. (You may notice that this also created a file called robots.txt. This is used to keep track of how long we should wait between calls to the same domain.)

Advanced usage: Python

Here's an example of how to use this from within Python:

import sqlite3
import doidownloader

# SQLite database where results will be stored
con = sqlite3.connect("somedois.db")
doidownloader.db.prepare_tables(con)
# List of DOIs to search for
dois_to_find = ["10.1108/JCRPP-02-2020-0025", "10.23860/JMLE-2020-12-3-1"]

async with doidownloader.DOIDownloader() as client:
    await save_fulltexts_from_dois(dois, con, client)

See __main__.py for an example of how to keep track of crawl delays per domain.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doidownloader-0.1.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doidownloader-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file doidownloader-0.1.0.tar.gz.

File metadata

  • Download URL: doidownloader-0.1.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for doidownloader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e88bc5e2729853896370644df81bc41803488e5724dd70feaa124f431ab7bb80
MD5 b965f759d97d9133cad40714bba42a17
BLAKE2b-256 6e62f7eb1b6f10532f3f9c03fbff5fd17d4cff7d9f168434c0a75bfbb22b5af4

See more details on using hashes here.

File details

Details for the file doidownloader-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: doidownloader-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for doidownloader-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67114bf727d823ac48b951b79bafbaa8be2582f754caec97e2090fb91775b7d7
MD5 f73ef17adbd243a1249dfbda87f01617
BLAKE2b-256 ad13e6f8d1c9287bd2e773cfbbdb1289ba651f2566a6d4c7310a6848da2db9fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page