Skip to main content

A (growing) set of client-side APIs to access and utilize clusters, buckets, and objects on AIStore.

Project description

AIS Python SDK

AIS Python SDK provides a (growing) set of client-side APIs to access and utilize AIS clusters, buckets, and objects.

The project is, essentially, a Python port of the AIS Go APIs, with additional objectives that prioritize utmost convenience for Python developers.

Note that only Python 3.x (version 3.6 or later) is currently supported.

Installation

Install as a Package

The latest AIS release can be easily installed either with Anaconda or pip:

$ conda install aistore
$ pip install aistore

Install From Source

If you'd like to work with the current upstream (and don't mind the risk), install the latest master directly from GitHub:

$ git clone https://github.com/NVIDIA/aistore.git

$ cd aistore/sdk/python

$ pip install -e .

Quick Start

In order to interact with your running AIS instance, you will need to create a client object:

from aistore import Client

client = Client("http://localhost:8080")

The newly created client object can be used to interact with your AIS cluster, buckets, and objects. Here are a few ways to do so:

# Check if AIS is deployed and running
client.cluster().is_aistore_running()
# Get cluster information
client.cluster().get_info()
# Create a bucket named "my-ais-bucket"
client.bucket("my-ais-bucket").create()
# Delete bucket named "my-ais-bucket"
client.bucket("my-ais-bucket").delete()
# Head bucket
client.bucket("my-ais-bucket").head()
# Head object
client.bucket("my-ais-bucket").object("my-object").head()
# Put Object
client.bucket("my-ais-bucket").object("my-new-object").put("path-to-object")

If you are using AIS buckets, you can simply omit the provider argument (defaults to ProviderAIS) when instantiating a bucket object (client.bucket("my-ais-bucket").create() is equivalent to client.bucket("my-ais-bucket", provider="ais").create()).

External Cloud Storage Buckets

AIS supports a number of different backend providers or, simply, backends.

For exact definitions and related capabilities, please see terminology.

Many bucket/object operations support remote cloud buckets (third-party backend-based cloud buckets), including a few of the operations shown above. To interact with remote cloud buckets, you need to specify the provider of choice when instantiating your bucket object as follows:

# Head AWS bucket
client.bucket("my-aws-bucket", provider="aws").head()
# Evict GCP bucket
client.bucket("my-gcp-bucket", provider="gcp").evict()
# Get object from Azure bucket
client.bucket("my-azure-bucket", provider="azure").object("filename.ext").get()
# List objects in AWS bucket'
client.bucket("my-aws-bucket", provider="aws").list_objects()

Please note that certain operations do not support external cloud storage buckets. Please refer to the API reference documentation for more information on which bucket/object operations support remote cloud buckets, as well as general information on class and method usage.

ETLs

AIStore also supports ETLs, short for Extract-Transform-Load. ETLs with AIS are beneficial given that the transformations occur locally, which largely contributes to the linear scalability of AIS.

Note: AIS-ETL requires Kubernetes. For more information on deploying AIStore with Kubernetes (or Minikube), refer here.

The following example is a sample workflow involing AIS-ETL.

We can initialize ETLs with either code or spec.

We initialize an ETL w/ code:

import hashlib

# Defining ETL transformation code
def transform(input_bytes):
    md5 = hashlib.md5()
    md5.update(input_bytes)
    return md5.hexdigest().encode()

# Initializing ETL  with transform()
client.etl().init_code(code=transform, etl_id="etl-code")

We initialize another ETL w/ spec:

from aistore.client.etl_templates import MD5

template = MD5.format(communication_type="hpush")
client.etl().init_spec(template=template, etl_id="etl-spec")

Refer to more templates here.

Once initialized, we can verify the ETLs are running with method list():

# List all running ETLs
client.etl().list()

We can get an object with the ETL transformations applied:

# Get object w/ ETL code transformation
obj1 = client.bucket("bucket-demo").object("object-demo").get(etl_id="etl-code").read_all()

# Get object w/ ETL spec transformation
obj2 = client.bucket("bucket-demo").object("object-demo").get(etl_id="etl-spec").read_all()

We can stop the ETLs if desired with method stop():

# Stop ETL 
client.etl().stop(etl_id="etl-code")
client.etl().stop(etl_id="etl-spec")

# Verify ETLs are not actively running
client.etl().list()

Stopped ETLs can be resumed with method start():

# Stop ETLs
client.etl().start(etl_id="etl-code")
client.etl().start(etl_id="etl-spec")

# Verify ETLs are not actively running
client.etl().list()

Finally, once finished with the ETLs, we cleanup by stopping the ETLs with stop and substenquently deleting the ETLs with delete():

# Stop ETLs
client.etl().stop(etl_id="etl-code")
client.etl().stop(etl_id="etl-spec")

# Delete ETLs
client.etl().delete(etl_id="etl-code")
client.etl().delete(etl_id="etl-spec")

Deleting an ETL deletes all pods created by Kuberenetes for the ETL. Consequently, deleted ETLs cannot be started again.

For an interactive demo, refer here.

More Examples

For more in-depth examples, please see SDK tutorial (Jupyter Notebook).

API Documentation

Module Summary
api.py Contains Client class, which has methods for making HTTP requests to an AIStore server. Includes factory constructors for Bucket, Cluster, and Xaction classes.
cluster.py Contains Cluster class that represents a cluster bound to a client and contains all cluster-related operations, including checking the cluster's health and retrieving vital cluster information.
bucket.py Contains Bucket class that represents a bucket in an AIS cluster and contains all bucket-related operations, including (but not limited to) creating, deleting, evicting, renaming, copying.
object.py Contains class Object that represents an object belonging to a bucket in an AIS cluster, and contains all object-related operations, including (but not limited to) retreiving, adding and deleting objects.
xaction.py Contains class Xaction and all xaction-related operations.
etl.py Contains class Etl and all ETL-related operations.

For more information on API usage, refer to the API reference documentation.

PyTorch Integration

You can list and load data from AIS buckets (buckets that are not 3rd party backend-based) and remote cloud buckets (3rd party backend-based cloud buckets) in PyTorch using AISFileLister and AISFileLoader.

AISFileLister and AISFileLoader are now available as a part of official pytorch/data project.

from torchdata.datapipes.iter import AISFileLister, AISFileLoader

# provide list of prefixes to load and list data from
ais_prefixes = ['gcp://bucket-name/folder/', 'aws:bucket-name/folder/', 'ais://bucket-name/folder/', ...]

# List all files for these prefixes using AISFileLister
dp_ais_urls = AISFileLister(url='localhost:8080', source_datapipe=ais_prefixes)

# print(list(dp_ais_urls))

# Load files using AISFileLoader
dp_files = AISFileLoader(url='localhost:8080', source_datapipe=dp_ais_urls)

for url, file in dp_files:
    pass

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aistore-1.0.3.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

aistore-1.0.3-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file aistore-1.0.3.tar.gz.

File metadata

  • Download URL: aistore-1.0.3.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for aistore-1.0.3.tar.gz
Algorithm Hash digest
SHA256 2df21e4cf4a65b143519a742db464d29a2b04c9aa453727385e4ba3f5b62ed65
MD5 f2c2796a6ef087f439eb44bb3d57d756
BLAKE2b-256 3ce0dcae4d155157a989290f95b20771681c1b7e1774897ac37f65c0bdfd87ec

See more details on using hashes here.

File details

Details for the file aistore-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: aistore-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for aistore-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8e5609579d9b149076c4b68dbe013170177199aacd35414a103c867d06af2ef8
MD5 b1d0ca4194b245a24dac4b4b5cb41f59
BLAKE2b-256 dd46beafe7f305d89c077fe3cd62c0971be6b1953fb2607d827569ab59fc0089

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page