DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using a json-like storage system like AWS's DynamoDB and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.
Project description
DataFS Distributed Data Management System
DataFS is an abstraction layer for data storage systems. It manages file versions and metadata using document-based storage systems (for now it supports DynamoDB and MongoDB) and relies on PyFilesystem to abstract file storage, allowing you to store files locally and on the cloud in a seamless interface.
Free software: MIT license
Documentation: https://datafs.readthedocs.io.
Features
Explicit version and metadata management for teams
Unified read/write interface across file systems
Easily create out-of-the-box configuration files for users
Usage
DataFS is built on the concept of “archives,” which are like files but with some additional features. Archives can track versions explicitly, can live on remote servers, and can be cached locally.
To interact with DataFS, you need to create an API object. This can be done in a number of ways, both within python and using spec files to allow users to use archives out of the box. See specifying DataAPI objects for more detail.
We’ll assume we already have an API object created. Once you have this, you can start using DataFS to create and use archives:
>>> my_archive = api.create_archive('my_archive', description = 'test data')
>>> my_archive.metadata
{'description': 'test data'}
Archives can be read from and written to much like a normal file:
>>> with my_archive.open('w+') as f:
... f.write(u'test archive contents')
...
>>> with my_archive.open('r') as f:
... print(f.read())
...
test archive contents
>>>
>>> with my_archive.open('w+') as f:
... f.write(u'new archive contents')
...
>>> with my_archive.open('r') as f:
... print(f.read())
...
new archive contents
By default, archives track versions explicitly. This can be turned off (such that old versions can be overwritten) using the flag versioned=False in create_archive. Version patch is bumped by default, but this can be overridden with the bumpversion argument on any write operations:
>>> my_archive.get_versions()
['0.0.1', '0.0.2']
>>>
>>> with my_archive.open('w+', bumpversion='major') as f:
... f.write(u'a major improvement')
...
>>> my_archive.get_versions()
['0.0.1', '0.0.2', '1.0']
We can also retrieve versioned data specifically:
>>> with my_archive.open('r', version='0.0.2') as f:
... print(f.read())
...
new archive contents
>>>
>>> with my_archive.open('r', version='1.0') as f:
... print(f.read())
...
a major improvement
>>>
See examples for more extensive use cases.
Todo
See issues to see and add to our todos.
Credits
This package was created by Justin Simcock and Michael Delgado of the Climate Impact Lab. Check us out on github.
Thanks also to audreyr for the wonderful cookiecutter package, and to pyup, a constant source of inspiration and our third contributor.
History
0.1.0 (2016-11-18)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file datafs-0.6.1.tar.gz
.
File metadata
- Download URL: datafs-0.6.1.tar.gz
- Upload date:
- Size: 49.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 735e7547123442511279484c65f8c3e3e7f38304c892cf2bdc9d2ed94f984722 |
|
MD5 | 9a56e9c8525365e106996aabcf39a663 |
|
BLAKE2b-256 | 91cd78ff7a3d52e2227dc482a320363fad455f8f28ba1be648491ccdd053324a |