Skip to main content

A highly scalable table backed up by Azure Table Storage, with improved functionality.

Project description

Puppettable

Pupppettable is a highly-scalable remote-hosted data storage for big datasets of small samples (under 1 MB). It makes it easy to handle data transparently by using a Python's data structures API interface. It allows to use Microsoft Azure Table Storage as the backend for the data.

How to install

It is available in the official PIP repository:

pip install puppettable

Basic usage

You need to create a Storage account in Microsoft Azure. Puppettable requires a connection_string to connect to the Azure Table Storage service.

>>> from puppettable import AzureTableService
>>> tables = AzureTableService(connection_string)
>>> tables
Azure 5 Tables: ["dataset1", "dataset2", "dataset3", ..., "mnist"]

Once the object tables is instantiated, it can be used as a dictionary to create or retrieve tables from the service. The tables are lazy-created as soon as an element is appended. They offer the same behaviour as a Python array:

>>> tables
>>> table = tables["newDataset"]

>>> table.insert("foo")
>>> table.insert("bar")
>>> table.append_many(["foo2", "bar2"])

>>> print("The length of the table is:", len(table))
The length of the table is 4
>>> print("The size of the table (in Bytes) is:", table.size())
The size of the table (in Bytes) is: 14

Every table keeps track of its internal statistics. The length and the size are computed dynamically, and the size is an estimation.

>>> table = tables["dataset4"]
>>> table
Table: 'dataset4'; Partition: 'default'; Length: 10565; Size: 67.599,49 KB

Managing data inside a table follows the same API as a Python array:

>>> table = tables["newDataset"]
>>> table[0]
foo
>>> table[0:2]
['foo', 'bar']
>>> table[-1]
bar2
>>> table[-2:]
['foo2', 'bar2']
>>> table[:]
['foo', 'bar', 'foo2', 'bar2']
>>> table[::2]
['foo', 'foo2']

>>> table[0] = "new_foo"
>>> table[2:4] = ["new_foo2", "new_bar2"]
>>> table[:]
['new_foo', 'bar', 'new_foo2', 'new_bar2']

Following the Azure distribution concepts, every table might have different partition keys for better distribution of the data. A partition key acts as an isolated group inside a table:

>>> tables
Azure 5 Tables: ["dataset1", "dataset2", "dataset3", ..., "mnist"]

>>> table = tables["dataset4"]
>>> table
Table: 'dataset4'; Partition: 'default'; Length: 10565; Size: 67.599,49 KB
>>> table.set_partition("test")
>>> table
Table: 'dataset4'; Partition: 'test'; Length: 0; Size: 0,0 KB
>>> table.set_partition()
>>> table
Table: 'dataset4'; Partition: 'default'; Length: 10565; Size: 67.599,49 KB

The deletion of elements/tables follows the same principle as the dictionaries and arrays in Python. Note though that the length of the table might stay intact as it is computed based on the maximum index detected in the data. This might lead to incoherent statistics due to having all the previous elements removed but having an element in a high position in the table. The size and the length can only be recalculated after eliminating the latest elements in the table.

Supported types

Puppettable supports natively the following types:

  • Numpy arrays
  • Pandas DataFrames / Series
  • Dictionaries
  • Arrays / Tuples
  • Strings / Integers / Floats

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

puppettable-0.0.1-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file puppettable-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: puppettable-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6

File hashes

Hashes for puppettable-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f7202e1eaa2c630a6956616b11f42c92d0140d8d2b486ccfe906277d483de81c
MD5 b29ac374619a82bbf92bc272f43b6332
BLAKE2b-256 49f424edd879f0f0d842a0446d114eec47bbf63c66354b28a866acc100ff7090

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page