Skip to main content

One downloader for many scientific data and code repositories!

Project description

Datahugger - Where DOI hugs Data

Datahugger - Where DOI :open_hands: Data

Datahugger is a tool to download scientific datasets, software, and code from a large number of repositories based on their DOI (wiki) or URL. With Datahugger, you can automate the downloading of data and improve the reproducibility of your research. Datahugger provides a straightforward Python interface as well as an intuitive Command Line Interface (CLI).

Supported repositories

Datahugger offers support for more than 377 generic and specific (scientific) repositories (and more to come!).

Datahugger support Zenodo, Dataverse, DataOne, GitHub, FigShare, HuggingFace, Mendeley Data, Dryad, OSF, and many more

We are still expanding Datahugger with support for more repositories. You can help by requesting support for a repository in the issue tracker. Pull Requests are very welcome as well.

Installation

PyPI

Datahugger requires Python 3.6 or later.

pip install datahugger

Getting started

Datahugger with Python

Load a dataset (or any digital asset) from a repository with the datahugger.get() function. The first argument is the DOI or URL, and the second is the folder name to store the dataset (it will be created if it does not exist).

The following code loads dataset 10.5061/dryad.mj8m0 into the folder data.

import datahugger

# download the dataset to the folder "data"
datahugger.get("10.5061/dryad.mj8m0", "data")

For an example of how this can integrate with your work, see the example workflow notebook or Open In Colab

Datahugger with command line

The command line function datahugger provides an easy interface to download data. The first argument is the DOI or URL, and the second argument is the name of the folder to store the dataset (will be created if it does not exist).

datahugger 10.5061/dryad.mj8m0 data
% datahugger 10.5061/dryad.mj8m0 data
Collecting...
NestTemperatureData.csv            : 100%|████████████████████████████████████████| 607k/607k
README_for_NestTemperatureData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
ExternalTemps.csv                  : 100%|██████████████████████████████████████| 1.06k/1.06k
README_for_ExternalTemps.txt       : 100%|██████████████████████████████████████| 2.82k/2.82k
InternalEggTempData.csv            : 100%|██████████████████████████████████████████| 664/664
README_for_InternalEggTempData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
SoilSimulation_Output.csv          : 100%|████████████████████████████████████████| 229M/229M
README_for_SoilSimulation_[...].txt: 100%|██████████████████████████████████████| 2.82k/2.82k
Dataset successfully downloaded.

Tip: On some systems, you have to quote the DOI or URL. For example: datahugger "10.5061/dryad.mj8m0" data.

Tips and tricks

Contact

Please feel free to reach out with questions, comments, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datahugger-0.13.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

datahugger-0.13-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file datahugger-0.13.tar.gz.

File metadata

  • Download URL: datahugger-0.13.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for datahugger-0.13.tar.gz
Algorithm Hash digest
SHA256 7b15da526ce57556b7726f5024d3aaaa962f55f3d8133d6f828010a57123698e
MD5 9e4615d7d677b0c87dd9adce460893a3
BLAKE2b-256 7fc57765d09f5b718424edd57a5fd89d7dedd906ce71ad4b2cee2ca6fc9d43bd

See more details on using hashes here.

File details

Details for the file datahugger-0.13-py3-none-any.whl.

File metadata

  • Download URL: datahugger-0.13-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for datahugger-0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 f7d4a69e07b012c3aba41f9895a3d1b2eacb09c43b78b5375c6fdfa1fd6cf766
MD5 d7a11acbeb2914fa8a355f239dad282c
BLAKE2b-256 48d32e1a6b92d07a1ba7578e8cb58c87b9b7835c122e82ae7ec57e8d3b05d8d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page