This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description
 ____            _             _                   _
|  _ \    __ _  | |_    __ _  | |       __ _    __| |
| | | |  / _` | | __|  / _` | | |      / _` |  / _` |
| |_| | | (_| | | |_  | (_| | | |___  | (_| | | (_| |
|____/   \__,_|  \__|  \__,_| |_____|  \__,_|  \__,_|
                                              Read me

1000ft overview

DataLad aims to make data management and data distribution more accessible. To do that it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange. This includes automated ingestion of data from online portals, and exposing it in readily usable form as Git(-annex) repositories, so-called datasets. The actual data storage and permission management, however, remains with the original data providers.

Status

DataLad is under rapid development. While the code base is still growing, the focus is increasingly shifting towards robust and safe operation with a sensible API. Organization and configuration are still subject of considerable reorganization and standardization. However, DataLad is, in fact, usable today and user feedback is always welcome.

DataLad 101

A growing number of datasets is made available from http://datasets.datalad.org . Those datasets are just regular git/git-annex repositories organized into a hierarchy using git submodules mechanism. So you can use regular git/git-annex commands to work with them, but might need datalad to be installed to provide additional functionality (e.g., fetching from portals requiring authentication such as CRCNS, HCP; or accessing data originally distributed in tarballs). But datalad aims to provide higher level interface on top of git/git-annex to simplify consumption and sharing of new or derived datasets. To that end, you can install all of those datasets using

datalad install -r ///

which will git clone all of those datasets under datasets.datalad.org sub-directory. This command will not fetch any large data files, but will merely recreate full hierarchy of all of those datasets locally, which also takes a good chunk of your filesystem meta-data storage. Instead of fetching all datasets at once you could either specify specific dataset to be installed, e.g.

datalad install ///openfmri/ds000113

or install top level dataset by omitting -r option and then calling datalad install for specific sub-datasets you want to have installed, possibly with -r to install their sub-datasets as well, e.g.

datalad install ///
cd datasets.datalad.org
datalad install -r openfmri/ds000001 indi/fcon1000

You can navigate datasets you have installed in your terminal or browser, while fetching necessary files or installing new sub-datasets using the datalad get [FILE|DIR] command. DataLad will take care about downloading, extracting, and possibly authenticating (would ask you for credentials) in a uniform fashion regardless of the original data location or distribution serialization (e.g., a tarball). Since it is using git and git-annex underneath, you can be assured that you are getting exact correct version of the data.

Use-cases DataLad covers are not limited to “consumption” of data. DataLad aims also to help publishing original or derived data, thus facilitating more efficient data management when collaborating or simply sharing your data. You can find more documentation at http://docs.datalad.org .

Contributing

See CONTRIBUTING.md if you are interested in internals or contributing to the project.

Installation

Debian-based systems

On Debian-based systems we recommend to enable NeuroDebian from which we provide recent releases of DataLad. datalad package recommends some relatively heavy packages (e.g. scrapy) which are useful only if you are interested in using crawl functionality. If you need just the base functionality of the datalad, install without recommended packages (e.g., apt-get install --no-install-recommends datalad)

Other Linux’es, OSX (Windows yet TODO) via pip

By default, installation via pip installs core functionality of datalad allowing for managing datasets etc. Additional installation schemes are available, so you could provide enhanced installation via pip install datalad[SCHEME] where SCHEME could be

  • crawl to also install scrapy which is used in some crawling constructs
  • tests to also install dependencies used by unit-tests battery of the datalad
  • full to install all dependencies.

For installation through pip you would need some external dependencies not shipped from it (e.g. git-annex, etc.) for which please refer to the next section.

Dependencies

Our setup.py and accompanying packaging describe all necessary dependencies. On Debian-based systems we recommend to enable NeuroDebian since we use it to provide backports of recent fixed external modules we depend upon, and up-to-date Git-annex is necessary for proper operation of DataLad packaged (install git-annex-standalone from NeuroDebian repository). Additionally, if you would like to develop and run our tests battery see CONTRIBUTING.md regarding additional dependencies.

Later we will provide bundled installations of DataLad across popular platforms.

License

MIT/Expat

Disclaimer

It is in a alpha stage – nothing is set in stone yet – but already usable in a limited scope.

Release History

Release History

0.4.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.1.dev1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
datalad-0.4.1.tar.gz (835.1 kB) Copy SHA256 Checksum SHA256 Source Nov 11, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting