A highly scalable table backed up by Azure Table Storage, with improved functionality.
Project description
Puppettable
Pupppettable is a highly-scalable remote-hosted data storage for big datasets of small samples (under 1 MB). It makes it easy to handle data transparently by using a Python's data structures API interface. It allows to use Microsoft Azure Table Storage as the backend for the data.
How to install
It is available in the official PIP repository:
pip install puppettable
Basic usage
You need to create a Storage account in Microsoft Azure. Puppettable requires a connection_string
to connect to the Azure Table Storage service.
>>> from puppettable import AzureTableService
>>> tables = AzureTableService(connection_string)
>>> tables
Azure 5 Tables: ["dataset1", "dataset2", "dataset3", ..., "mnist"]
Once the object tables
is instantiated, it can be used as a dictionary to create or retrieve tables from the service.
The tables are lazy-created as soon as an element is appended. They offer the same behaviour as a Python array:
>>> tables
>>> table = tables["newDataset"]
>>> table.insert("foo")
>>> table.insert("bar")
>>> table.append_many(["foo2", "bar2"])
>>> print("The length of the table is:", len(table))
The length of the table is 4
>>> print("The size of the table (in Bytes) is:", table.size())
The size of the table (in Bytes) is: 14
Every table keeps track of its internal statistics. The length and the size are computed dynamically, and the size is an estimation.
>>> table = tables["dataset4"]
>>> table
Table: 'dataset4'; Partition: 'default'; Length: 10565; Size: 67.599,49 KB
Managing data inside a table follows the same API as a Python array:
>>> table = tables["newDataset"]
>>> table[0]
foo
>>> table[0:2]
['foo', 'bar']
>>> table[-1]
bar2
>>> table[-2:]
['foo2', 'bar2']
>>> table[:]
['foo', 'bar', 'foo2', 'bar2']
>>> table[::2]
['foo', 'foo2']
>>> table[0] = "new_foo"
>>> table[2:4] = ["new_foo2", "new_bar2"]
>>> table[:]
['new_foo', 'bar', 'new_foo2', 'new_bar2']
Following the Azure distribution concepts, every table might have different partition keys for better distribution of the data. A partition key acts as an isolated group inside a table:
>>> tables
Azure 5 Tables: ["dataset1", "dataset2", "dataset3", ..., "mnist"]
>>> table = tables["dataset4"]
>>> table
Table: 'dataset4'; Partition: 'default'; Length: 10565; Size: 67.599,49 KB
>>> table.set_partition("test")
>>> table
Table: 'dataset4'; Partition: 'test'; Length: 0; Size: 0,0 KB
>>> table.set_partition()
>>> table
Table: 'dataset4'; Partition: 'default'; Length: 10565; Size: 67.599,49 KB
The deletion of elements/tables follows the same principle as the dictionaries and arrays in Python. Note though that the length of the table might stay intact as it is computed based on the maximum index detected in the data. This might lead to incoherent statistics due to having all the previous elements removed but having an element in a high position in the table. The size and the length can only be recalculated after eliminating the latest elements in the table.
Supported types
Puppettable supports natively the following types:
- Numpy arrays
- Pandas DataFrames / Series
- Dictionaries
- Arrays / Tuples
- Strings / Integers / Floats
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file puppettable-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: puppettable-0.0.1-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7202e1eaa2c630a6956616b11f42c92d0140d8d2b486ccfe906277d483de81c |
|
MD5 | b29ac374619a82bbf92bc272f43b6332 |
|
BLAKE2b-256 | 49f424edd879f0f0d842a0446d114eec47bbf63c66354b28a866acc100ff7090 |