Skip to main content

Simplified Downloading from Data Repositories with RESTful APIS

Project description

DataRig is released under the BSD 3-Clause license. Python versions supported. DataRig's test status Pull Request Welcomed!

Features | Installation | Dependencies | Documentation | Attribution | Contributions | Issues | Acknowledgements

Features

Providing large testing and demo data alongside your package releases is challenging for two reasons. First, code repositories have strict limits on file sizes. Second, you don't want your users to wait forever to download your cool package because you've included large data files. If you're a python developer and have hit these issues then DataRig is for you. DataRig allows you to move data from web-based repositories into your user's local directories post-installation. This "just-in-time" data fetching is perfect for users to test or run your package's demos.

Installation

DataRig can be installed into your projects environment using pip:

  1. Activate the virtual or conda environment of your package
$ source <YOUR_ENV>/bin/activate # python virtual environment
$ conda activate <YOUR_ENV>
  1. Install DataRig to your active environment
(<YOUR_ENV>)$ pip install datarig

Dependencies

DataRig is super lightweight requiring just Python 3.9 and the request library available here:

package pypi conda
requests https://pypi.org/project/requests/

Documentation

Using DataRig to access a repository is simple. Just build a Record instance and all the data will be at your fingertips. Here's how to do it for a sample Zenodo repository:

$ ipython
>>> from datarig import Zenodo
>>> # set the url to the api endpoint url for the record id 7868945
>>> url = 'http://zenodo.org/api/records/7868945'
>>> record = Zenodo(url)

This record contains all of the repositories information stored as attributes. To see everything at once, just print the record.

>>> print(record)

You will see a datasets attribute with a list of Dataset objects. These Datasets contain the name, url link, size and file type of the datasets that can be downloaded from the repository record. Let's print each of them.

>>> for dset in record.datasets:
...     print(dset)

Notice that a Dataset instance describes the data but does not contain the actual data. To get the data to your machine, you call call the records 'download' method. Let's get help for this method before calling it.

>>> help(record.download)

To call this method we need a directory to place the downloaded data, the name of the dataset to download, the amount of memory to use during downloading (chunksize) and a boolean of whether the download should be streamed to disk. Streaming is usually the right choice since the files you will download are likely large. Let's download the "sample_arr.npy" file from this record into your current working dir.

>>> from pathlib import Path
>>> record.download(directory=None, name='sample_arr.npy')

That's it! You've just downloaded a dataset from a Zenodo record :sunglasses:

Attribution

If you find DataRig useful, please cite the Zenodo archive of this repository.

If you really like DataRig, you can also star the repository !

Contributions

Contributions are what makes open-source fun and we would love for you to contribute. Please check out our contribution guide to get started.

Issues

DataRig provides custom issue templates for filing bugs, requesting feature enhancements, suggesting documentation changes, or just asking questions. Ready to discuss? File an issue here.

Acknowledgements

This work is generously supported through the Ting Tsung and Wei Fong Chao Foundation and the National Institute of Neurological Disorders and Stroke (Grant 2R01 NS100738-05A1).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datarig-1.0.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

datarig-1.0.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file datarig-1.0.0.tar.gz.

File metadata

  • Download URL: datarig-1.0.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for datarig-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a5c69d642503201bd833e2749a2282ef68bc0cc8b5890a3c44cce81cef01f1a4
MD5 096549c3d6cae1276cba5b8ecc5af754
BLAKE2b-256 cad5aea409a3404889dbc7d028784c0fced4c023e92373ce5850af8bb9d1fcf4

See more details on using hashes here.

File details

Details for the file datarig-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: datarig-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for datarig-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91216753b6b97fdc07dc47e56bd2b3f134f7fa0edec2eb0d7cef5b8e30c38c16
MD5 fe54c14a849489c7a4403b519c6424b8
BLAKE2b-256 ff9ef7e5107b02c3c39de9a8b7df02484db2723a1b273fef9bfdc80bdae9f2ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page