DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using a json-like storage system like AWS's DynamoDB and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.
Project description
DataFS Data Management System
DataFS is a package manager for data. It manages file versions, dependencies, and metadata for individual use or large organizations.
Configure and connect to a metadata Manager and multiple data Services using a specification file and you’ll be sharing, tracking, and using your data in seconds.
Free software: MIT license
Documentation: https://datafs.readthedocs.io.
Features
Explicit version and metadata management for teams
Unified read/write interface across file systems
Easily create out-of-the-box configuration files for users
Track data dependencies and usage logs
Use datafs from python or from the command line
Permissions handled by managers & services, giving you control over user access
Usage
First, configure an API. Don’t worry. It’s not too bad.
We’ll assume we already have an API object created and attached to a service called “s3”. Once you have this, you can start using DataFS to create and use archives.
$ datafs create my_new_data_archive --description "a test archive"
created versioned archive <DataArchive s3://my_new_data_archive>
$ echo "initial file contents" > my_file.txt
$ datafs update my_new_data_archive my_file.txt
$ datafs cat my_new_data_archive
initial file contents
Versions are tracked explicitly. Bump versions on write, and read old versions if desired.
$ echo "updated contents" > my_file.txt
$ datafs update my_new_data_archive my_file.txt --bumpversion minor
uploaded data to <DataArchive s3://my_new_data_archive>. version bumped 0.0.1 --> 0.1.
$ datafs cat my_new_data_archive
updated contents
$ datafs cat my_new_data_archive --version 0.0.1
initial file contents
Pin versions using a requirements file to set the default version
$ echo "my_new_data_archive==0.0.1" > requirements_data.txt
$ datafs cat my_new_data_archive
initial file contents
All of these features are available from (and faster in) python:
>>> import datafs
>>> api = datafs.get_api()
>>> archive = api.get_archive('my_new_data_archive')
>>> with archive.open('r', version='latest') as f:
... print(f.read())
...
updated contents
If you have permission to delete archives, it’s easy to do. See administrative tools for tips on setting permissions.
$ datafs delete my_new_data_archive
deleted archive <DataArchive s3://my_new_data_archive>
See examples for more extensive use cases.
Installation
pip install datafs
Additionally, you’ll need to choose a manager and services:
Managers:
MongoDB: pip install pymongo
DynamoDB: pip install boto3
Services:
Ready out-of-the-box:
local
shared
mounted
zip
ftp
http/https
in-memory
Requiring additional packages:
AWS/S3: pip install boto
SFTP: pip install paramiko
XMLRPC: pip install xmlrpclib
Requirements
For now, DataFS requires python 2.7. We’re working on 3x support.
Todo
See issues to see and add to our todos.
Credits
This package was created by Justin Simcock and Michael Delgado of the Climate Impact Lab. Check us out on github.
Major kudos to the folks at PyFilesystem. Thanks also to audreyr for the wonderful cookiecutter package, and to Pyup, a constant source of inspiration and our silent third contributor.
History
0.1.0 (2016-11-18)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file datafs-0.6.2.tar.gz
.
File metadata
- Download URL: datafs-0.6.2.tar.gz
- Upload date:
- Size: 56.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 673ab34b4fc911cb3724dffe579a7d621de2880666d0f25559ae7b76f317f0a5 |
|
MD5 | 4db459188bca79a9b65161bbd14f2f06 |
|
BLAKE2b-256 | 6084088ccdcc7980ed04721e633735ded4abde6b25a7a86e215fa566888952e8 |