DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using a json-like storage system like AWS's DynamoDB and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.
DataFS Data Management System
DataFS is a package manager for data. It manages file versions, dependencies, and metadata for individual use or large organizations.
- Free software: MIT license
- Documentation: https://datafs.readthedocs.io.
- Explicit version and metadata management for teams
- Unified read/write interface across file systems
- Easily create out-of-the-box configuration files for users
- Track data dependencies and usage logs
- Use datafs from python or from the command line
- Permissions handled by managers & services, giving you control over user access
First, configure an API. Don’t worry. It’s not too bad.
We’ll assume we already have an API object created and attached to a service called “s3”. Once you have this, you can start using DataFS to create and use archives.
$ datafs create my_new_data_archive --description "a test archive" created versioned archive <DataArchive s3://my_new_data_archive> $ echo "initial file contents" > my_file.txt $ datafs update my_new_data_archive my_file.txt $ datafs cat my_new_data_archive initial file contents
Versions are tracked explicitly. Bump versions on write, and read old versions if desired.
$ echo "updated contents" > my_file.txt $ datafs update my_new_data_archive my_file.txt --bumpversion minor uploaded data to <DataArchive s3://my_new_data_archive>. version bumped 0.0.1 --> 0.1. $ datafs cat my_new_data_archive updated contents $ datafs cat my_new_data_archive --version 0.0.1 initial file contents
Pin versions using a requirements file to set the default version
$ echo "my_new_data_archive==0.0.1" > requirements_data.txt $ datafs cat my_new_data_archive initial file contents
All of these features are available from (and faster in) python:
>>> import datafs >>> api = datafs.get_api() >>> archive = api.get_archive('my_new_data_archive') >>> with archive.open('r', version='latest') as f: ... print(f.read()) ... updated contents
If you have permission to delete archives, it’s easy to do. See administrative tools for tips on setting permissions.
$ datafs delete my_new_data_archive deleted archive <DataArchive s3://my_new_data_archive>
See examples for more extensive use cases.
pip install datafs
Additionally, you’ll need to choose a manager and services:
- MongoDB: pip install pymongo
- DynamoDB: pip install boto3
- Ready out-of-the-box:
- Requiring additional packages:
- AWS/S3: pip install boto
- SFTP: pip install paramiko
- XMLRPC: pip install xmlrpclib
For now, DataFS requires python 2.7. We’re working on 3x support.
See issues to see and add to our todos.
- First release on PyPI.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size datafs-0.6.2.tar.gz (56.4 kB)||File type Source||Python version None||Upload date||Hashes View|