Skip to main content

data distribution geared toward scientific datasets

Project description

# DataLad

DataLad aims to deliver a data distribution. Original motive was to provide a platform for harvesting data from online portals and exposing collected data in a readily-usable form from [Git-annex] repositories, while fetching data load from the original data providers.

# Status

It is currently in a heavy initial development mode to establish core functionality which could be used by others. Codebase is rapidly growing, functionality is usable for many use-cases but not yet officially released to public since its organization and configuration will be a subject for a considerable reorganization and standardization. Primary purpose of the development is to catch major use-cases and try to address them to get a better understanding of the ultimate specs and design.

See [CONTRIBUTING.md](CONTRIBUTING.md) if you are interested in internals and/or contributing to the project.

## Code status:

# Installation

## Debian-based systems

On Debian-based systems we recommend to enable [NeuroDebian](http://neuro.debian.net) from which we provide recent releases of DataLad.

TODO: describe few flavors of packages we would provide (I guess datalad-core, datalad-crawler, datalad; primary difference is dependencies)

## Other Linux’es, OSX (Windows yet TODO) via pip

TODO: upload to PyPi and describe installation ‘schemes’ (crawler, tests, full). Ideally we should unify the schemes with Debian packages

For installation through pip you would need some external dependencies not shipped from it (e.g. git-annex, etc.) for which please refer to the next section.

## Dependencies

Although we now support Python 3 (>= 3.3), primarily we still use Python 2.7 and thus instructions below are for python 2.7 deployments. Replace python-{ with python{,3}-{ to also install dependencies for Python 3 (e.g., if you would like to develop and test through tox).

On Debian-based systems we recommend to enable [NeuroDebian](http://neuro.debian.net) since we use it to provide backports of recent fixed external modules we depend upon:

`sh apt-get install -y -q git git-annex-standalone apt-get install -y -q patool python-scrapy python-{appdirs,argcomplete,git,humanize,keyring,lxml,msgpack,mock,progressbar,requests,setuptools,six} `

or additionally, if you would like to develop and run our tests battery as described in [CONTRIBUTING.md](CONTRIBUTING.md) and possibly use tox and new versions of dependencies from pypy:

`sh apt-get install -y -q python-{dev,httpretty,testtools,nose,pip,vcr,virtualenv} python-tox # Some libraries which might be needed for installing via pip apt-get install -y -q lib{ffi,ssl,curl4-openssl,xml2,xslt1}-dev `

or use pip to install Python modules (prior installation of those libraries listed above might be necessary)

`sh pip install -r requirements.txt `

and will need to install recent git-annex using appropriate for your OS means (for Debian/Ubuntu, once again, just use NeuroDebian). We later will provide bundled installations of DataLad across popular platforms.

# License

MIT/Expat

# Disclaimer

It is in a prototype stage – nothing is set in stone yet – but already usable in a limited scope.

[Git-annex]: http://git-annex.branchable.com

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalad-0.2.tar.gz (320.8 kB view details)

Uploaded Source

File details

Details for the file datalad-0.2.tar.gz.

File metadata

  • Download URL: datalad-0.2.tar.gz
  • Upload date:
  • Size: 320.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for datalad-0.2.tar.gz
Algorithm Hash digest
SHA256 3a5aa09e718fc6cd3af021ab0b01c18014b8a142531aa767a029daafd33c4588
MD5 a606155da69a42c72fc4b8d6c12ce88e
BLAKE2b-256 ec0f485bcff734e712dfba985f3bdf0daea1cb638a0667f4df8f4f09638cdff6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page