Skip to main content

No project description provided

Project description

https://img.shields.io/pypi/v/datafs.svg https://travis-ci.org/ClimateImpactLab/DataFS.svg?branch=master https://coveralls.io/repos/github/ClimateImpactLab/DataFS/badge.svg?branch=master Documentation Status Updates Codacy Badge

DataFS is a package manager for data. It manages file versions, dependencies, and metadata for individual use or large organizations.

Configure and connect to a metadata Manager and multiple data Services using a specification file and you’ll be sharing, tracking, and using your data in seconds.

Features

  • Explicit version and metadata management for teams

  • Unified read/write interface across file systems

  • Easily create out-of-the-box configuration files for users

  • Track data dependencies and usage logs

  • Use datafs from python or from the command line

  • Permissions handled by managers & services, giving you control over user access

Usage

First, configure an API. Don’t worry. It’s not too bad. Check out the quickstart to follow along.

We’ll assume we already have an API object created and attached to a service called “local”. Once you have this, you can start using DataFS to create and use archives.

$ datafs create my_new_data_archive --description "a test archive"
created versioned archive <DataArchive local://my_new_data_archive>

$ echo "initial file contents" > my_file.txt

$ datafs update my_new_data_archive my_file.txt

$ datafs cat my_new_data_archive
initial file contents

Versions are tracked explicitly. Bump versions on write, and read old versions if desired.

$ echo "updated contents" > my_file.txt

$ datafs update my_new_data_archive my_file.txt --bumpversion minor
uploaded data to <DataArchive local://my_new_data_archive>. version bumped 0.0.1 --> 0.1.

$ datafs cat my_new_data_archive
updated contents

$ datafs cat my_new_data_archive --version 0.0.1
initial file contents

Pin versions using a requirements file to set the default version

$ echo "my_new_data_archive==0.0.1" > requirements_data.txt

$ datafs cat my_new_data_archive
initial file contents

All of these features are available from (and faster in) python:

>>> import datafs
>>> api = datafs.get_api()
>>> archive = api.get_archive('my_new_data_archive')
>>> with archive.open('r', version='latest') as f:
...     print(f.read())
...
updated contents

If you have permission to delete archives, it’s easy to do. See administrative tools for tips on setting permissions.

$ datafs delete my_new_data_archive
deleted archive <DataArchive local://my_new_data_archive>

See examples for more extensive use cases.

Installation

pip install datafs

Additionally, you’ll need a manager and services:

Managers:

  • MongoDB: pip install pymongo

  • DynamoDB: pip install boto3

Services:

  • Ready out-of-the-box:

    • local

    • shared

    • mounted

    • zip

    • ftp

    • http/https

    • in-memory

  • Requiring additional packages:

    • AWS/S3: pip install boto

    • SFTP: pip install paramiko

    • XMLRPC: pip install xmlrpclib

Requirements

For now, DataFS requires python 2.7. We’re working on 3x support.

Todo

See issues to see and add to our todos.

Credits

This package was created by Justin Simcock and Michael Delgado of the Climate Impact Lab. Check us out on github.

Major kudos to the folks at PyFilesystem. Thanks also to audreyr for the wonderful cookiecutter package, and to Pyup, a constant source of inspiration and our silent third contributor.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafs-0.6.9.tar.gz (72.7 kB view details)

Uploaded Source

File details

Details for the file datafs-0.6.9.tar.gz.

File metadata

  • Download URL: datafs-0.6.9.tar.gz
  • Upload date:
  • Size: 72.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for datafs-0.6.9.tar.gz
Algorithm Hash digest
SHA256 2457ddfb2cc48f1ddda8324c7b1eb6f4060a7836590cd15f85505da5ad36c722
MD5 66876bbaa461cf7fa888968071222092
BLAKE2b-256 d950e13e563ef819dc70886a3bfda9c52f15ab678272ddbd85e490a08aa16407

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page