A consistent approach to file operations, anywhere.
Project description
Cabinets
cabinets
is a Python library that provides a consistent interface for file operations
across multiple storage platforms. File extensions are dynamically detected to allow
automatic serialization and deserialization of Python objects.
cabinets
supports a variety of protocols and file
format parsers natively, and new protocols or parsers can be
easily registered.
Table of contents
- Sample Usage
- Built-in Protocols and Parsers
- Protocol Configuration
- Custom Protocols and Parsers
- Contributing
Sample Usage
Read a file
Set up a test file in your local filesystem:
import json
obj = {'test': 1}
with open('data.json', 'w') as fh:
json.dump(obj, fh)
Read back and parse the file using cabinets
:
import cabinets
new_obj = cabinets.read('test.json')
That's it! The file is loaded and parsed in just one line.
Write a file
cabinets
also supports creating files. We can rewrite the first example using
only cabinets
.
import cabinets
obj = {'test': 1}
cabinets.create('test.json', obj)
new_obj = cabinets.read('test.json')
assert new_obj == obj
List files in a directory
In some situations, you may need to know what files are in a directory before doing
any operations. cabinets
also provides an list
function to assist with this.
import cabinets
obj = {'test': 1}
cabinets.create('example/test.json', obj)
cabinets.create('example/test2.yaml', obj)
cabinets.create('example/subdir/test3.txt', "test")
assert cabinets.list('example/') == ['test.json', 'test2.yaml']
assert cabinets.list('example/subdir/') == ['test3.txt']
Important: For simplicity,
cabinets
restricts the output oflist
to only file types. Subdirectories are excluded, and must be queried separately. Future versions may include a flag inlist
for returning subdirectories as well.
Reading and Writing with Other Protocols
Using cabinets
allows you to interact with multiple file storage protocols depending
on the URI you specify. In the previous examples, we used
read()
and write()
to operate within our local file system; that's
because cabinets
assumes we're using the file://
protocol by default. Luckily,
accessing other storage systems is just as easy!
For example, operating on a file on AWS S3 is done exactly the same way:
import cabinets
# Read JSON file from your filesystem
local_obj = cabinets.read('file://test.json')
# Write that object to a file in AWS S3
cabinets.create('s3://test.json', local_obj)
# Read back the same file from AWS S3
remote_obj = cabinets.read('s3://test.json')
assert local_obj == remote_obj
The above example will read a file from the local filesystem and create a new file containing the same data, at the same path in S3.
By prefixing the path with {protocol}://
we specify how and where cabinets
should
look for a file. Using file://
(default if none specified) tells cabinets
to use *
path* on the local filesystem. Using s3://
on the other hand instructs cabinets
to
perform operations against that path in AWS S3.
NOTE: The
S3Cabinet
may require initial configuration for thes3
protocol to function properly. See Protocol Configuration for details.
See all the natively supported protocols below.
Built-in Protocols and Parsers
Protocols
- Local File System (
file://
) - S3 (
s3://
)
Parsers
- YAML (
.yml
,.yaml
) - JSON (
.json
) - Python Pickle (
.pickle
) - CSV (beta) (
.csv
) - TXT (
.txt
)
import cabinets
# .foo file in local filesystem
local_foo_data = cabinets.read('file://test.foo')
# .foo file in S3
s3_foo_data = cabinets.read('s3://test.foo')
Protocol Configuration
Some storage platform protocols may require additional configuration parameters to be
set before they can be used. Each Cabinet
subclass can expose
a set_configuration(**config)
class method to take care of any required initial setup.
from cabinets.cabinet.s3_cabinet import S3Cabinet
# set the AWS S3 region to us-west-2 and specify an access key
S3Cabinet.set_configuration(region_name='us-west-2', aws_access_key_id=...)
# use specific Cabinet to avoid protocol prefix
S3Cabinet.read('bucket-in-us-west-2/test.json')
# or use generic Cabinet with protocol prefix
import cabinets
cabinets.read('s3://bucket-us-west-2/test.json')
See the documentation of specific Cabinet
classes for what configuration parameters
are available.
Additionally, there is a top-level set_configuration()
function so that importing
specific Cabinet
subclasses is not required. Simply pass the desired protocol as the
first argument.
import cabinets
# *OPTIONAL*: set the AWS S3 region to us-west-2 and specify an access key
cabinets.set_configuration('s3', region_name='us-west-2', aws_access_key_id=...)
# use generic Cabinet with protocol prefix
cabinets.read('s3://bucket-us-west-2/test.json')
Custom Protocols and Parsers
cabinets
is designed to allow complete extensibility in adding new protocols and
parsers. Just because your desired storage platform or file format is not listed above,
doesn't mean you can't use it with cabinets
!
Adding Cabinets
New protocol connections can be added by subclassing abstract base class Cabinet
, and
registering the class to one or more protocol identifiers:
from cabinets import Cabinet, register_protocols
@register_protocols('foo')
class FooCabinet(Cabinet):
@classmethod
def set_configuration(cls, **kwargs):
# Set up any necessary configuration parameters for "foo" protocol
...
@classmethod
def read_content(cls, path: str) -> bytes:
# Custom logic for reading bytes from a path using "foo" protocol
...
@classmethod
def create_content(cls, path: str, content: bytes):
# Custom logic for writing bytes to a path using "foo" protocol
...
@classmethod
def delete_content(cls, path):
# Custom logic for deleting the object at a path using "foo" protocol
...
Here we define a FooCabinet
, and register it to the protocol identifier foo
. Once
this class is loaded, any cabinets
function calls using the foo://
prefix will be
processed with this class. This means if we called:
import cabinets
from ... import FooCabinet # ensure FooCabinet is loaded
cabinets.read('foo://example.json')
The first call that occurs will be FooCabinet.read_content('foo.json)
, and that result
is then parsed by the JSONParser
before being returned.
NOTE: In order for the protocols to be registered, the class definition must be run at least once. Make sure the modules where your custom
Cabinet
classes are defined are imported somewhere before they are used, OR use the built in Plugin system.
Adding Parsers
cabinets
also supports custom extension parsing in the exact same way:
from cabinets.parser import Parser, register_extensions
@register_extensions('bar')
class BarParser(Parser):
@classmethod
def load_content(cls, content: bytes):
# Parse bytes from "bar" file format into a Python object
...
@classmethod
def dump_content(cls, data: Any):
# Dump a Python object into bytes in the "bar" file format
...
Now if we redo our above example using the .bar
extension:
from ... import FooCabinet, BarParser # ensure FooCabinet and BarParser are loaded
cabinets.read('foo://example.bar')
This statement is roughly equivalent to:
BarParser.load_content(FooCabinet.read_content('foo.bar'))
and should return a Python object from your Foo
cabinet, using your Bar
parser!
Loading Plugins
As mentioned in the example above, your custom Cabinet
and Parser
classes must be
executed in order to be added to the internal cache cabinets
uses for protocol and
extension lookup. If your custom classes are imported before any cabinets
functions
are use then, this won't be an issue. However, in many use cases there is no reason to
import those classes aside from usage with cabinets
functions. Instead of requiring
each class to be imported manually at the start of your program,
cabinets
can search a specified path for new Cabinet
and Parser
classes, and load
them automatically.
Specifying the PLUGIN_PATH
environment variable will cause cabinets
to search for
subdirectories called cabinet
and parser
in that path. Modules residing within those
directories will be searched for Cabinet
and Parser
subclasses respectively.
└─ PLUGIN_PATH
|
└───cabinet
│ │ foo_cabinet.py
└───parser
│ │ bar_parser.py
│ │ baz_parser.py
If the above FooCabinet
and BarParser
classes are placed in foo_cabinet.py
and bar_parser.py
, they will be loaded and registered to their specified cache without
needing to be referenced anywhere else in the program.
Contributing
This package is open source (see LICENSE), so please feel free to contribute by submitting a pull request, creating an issue, or contacting the authors directly.
Authors and Contributors
- Lucas Lofaro (Co-Author): lucasmlofaro@gmail.com
- Sam Hollenbach ( Co-Author): samhollenbach@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cabinets-0.7.0.tar.gz
.
File metadata
- Download URL: cabinets-0.7.0.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33f561a21f6a9395850997eb1285fd433ba9f5e2ef8b4c7afc6f5a6c86457e37 |
|
MD5 | 98ea097d679fa0b692c89283365ab67c |
|
BLAKE2b-256 | c4bc143869b49b9e4fb6338a5abd94f4e18528086fb87cc299a689a53e157831 |
File details
Details for the file cabinets-0.7.0-py3-none-any.whl
.
File metadata
- Download URL: cabinets-0.7.0-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8eb453b535a67513d628846711d25d155b2da6b6597c4fec85dc987684003b60 |
|
MD5 | 27409913cc066f8281bfdbcc6dd1039b |
|
BLAKE2b-256 | bcefe21748d77e91e03856e5de18166b49d2caa24800573144e21b9c293d9c33 |