Skip to main content

One downloader for many scientific data and code repositories!

Project description

Datahugger - Where DOI hugs Data

Datahugger - Where DOI :open_hands: Data

Datahugger is a tool to download scientific datasets, software, and code from a large number of repositories based on their DOI (wiki) or URL. With Datahugger, you can automate the downloading of data and improve the reproducibility of your research. Datahugger provides a straightforward Python interface as well as an intuitive Command Line Interface (CLI).

Supported repositories

Datahugger offers support for more than 150 generic and specific (scientific) repositories (and more to come!).

Datahugger support Zenodo, Dataverse, DataOne, GitHub, FigShare, HuggingFace, Mendeley Data, Dryad, OSF, and many more

We are still expanding Datahugger with support for more repositories. You can help by requesting support for a repository in the issue tracker. Pull Requests are very welcome as well.

Installation

PyPI

Datahugger requires Python 3.6 or later.

pip install datahugger

Getting started

Download with Python

Load a dataset (or any digital asset) from a repository with the datahugger.load_repository function. The first argument is the DOI or URL and the second argument the name of the folder to store the dataset (will be created if it does not exist).

import datahugger

# download the data to your device
datahugger.get("10.5061/dryad.x3ffbg7m8", "data")

The data from DOI 10.5061/dryad.x3ffbg7m8 is now stored in the folder data. The data can now be accessed and analyzed. For example:

import pandas as pd

df = pd.read_csv("data/Pfaller_Robinson_2022_Global_Sea_Turtle_Epibiont_Database.csv")
print(df["Higher Taxon"].value_counts())

Download with command line

The command line function datahugger provides an easy interface to download data. The first argument is the DOI or URL and the second argument the name of the folder to store the dataset (will be created if it does not exist).

datahugger 10.5061/dryad.31zcrjdm5 data
% datahugger 10.5061/dryad.x3ffbg7m8 data
README_Pfaller_Robinson_20[...].txt: 100%|█████████████████████████████████████| 17.1k/17.1k [00:00<00:00, 2.62MB/s]
Pfaller_Robinson_2022_Glob[...].csv: 100%|████████████████████████████████████████| 709k/709k [00:00<00:00, 904kB/s]
Repository content successfully downloaded.

Tip: On some systems, you have to quote the DOI or URL. For example: datahugger "10.5061/dryad.31zcrjdm5" data.

Tips and tricks

License

MIT

Contact

Feel free to reach out with questions, remarks, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datahugger-0.1.tar.gz (1.8 MB view hashes)

Uploaded Source

Built Distribution

datahugger-0.1-py3-none-any.whl (14.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page