A (growing) set of client-side APIs to access and utilize clusters, buckets, and objects on AIStore.
AIS Python SDK
AIS Python SDK provides a (growing) set of client-side APIs to access and utilize AIS clusters, buckets, and objects.
The project is, essentially, a Python port of the AIS Go APIs, with additional objectives that prioritize utmost convenience for Python developers.
Note that only Python 3.x (version 3.6 or later) is currently supported.
Install as a Package
The latest AIS release can be easily installed either with Anaconda or
$ conda install aistore
$ pip install aistore
Install From Source
If you'd like to work with the current upstream (and don't mind the risk), install the latest master directly from GitHub:
$ git clone https://github.com/NVIDIA/aistore.git $ cd aistore/sdk/python $ pip install -e .
In order to interact with your running AIS instance, you will need to create a
from aistore import Client client = Client("http://localhost:8080")
The newly created
client object can be used to interact with your AIS cluster, buckets, and objects. Here are a few ways to do so:
# Check if AIS is deployed and running client.cluster().is_aistore_running()
# Get cluster information client.cluster().get_info()
# Create a bucket named "my-ais-bucket" client.bucket("my-ais-bucket").create()
# Delete bucket named "my-ais-bucket" client.bucket("my-ais-bucket").delete()
# Head bucket client.bucket("my-ais-bucket").head()
# Head object client.bucket("my-ais-bucket").object("my-object").head()
# Put Object client.bucket("my-ais-bucket").object("my-new-object").put("path-to-object")
If you are using AIS buckets, you can simply omit the provider argument (defaults to ProviderAIS) when instantiating a bucket object (
client.bucket("my-ais-bucket").create()is equivalent to
External Cloud Storage Buckets
AIS supports a number of different backend providers or, simply, backends.
For exact definitions and related capabilities, please see terminology.
Many bucket/object operations support remote cloud buckets (third-party backend-based cloud buckets), including a few of the operations shown above. To interact with remote cloud buckets, you need to specify the provider of choice when instantiating your bucket object as follows:
# Head AWS bucket client.bucket("my-aws-bucket", provider="aws").head()
# Evict GCP bucket client.bucket("my-gcp-bucket", provider="gcp").evict()
# Get object from Azure bucket client.bucket("my-azure-bucket", provider="azure").object("filename.ext").get()
# List objects in AWS bucket' client.bucket("my-aws-bucket", provider="aws").list_objects()
Please note that certain operations do not support external cloud storage buckets. Please refer to the API reference documentation for more information on which bucket/object operations support remote cloud buckets, as well as general information on class and method usage.
AIStore also supports ETLs, short for Extract-Transform-Load. ETLs with AIS are beneficial given that the transformations occur locally, which largely contributes to the linear scalability of AIS.
The following example is a sample workflow involing AIS-ETL.
We initialize an ETL w/ code:
import hashlib # Defining ETL transformation code def transform(input_bytes): md5 = hashlib.md5() md5.update(input_bytes) return md5.hexdigest().encode() # Initializing ETL with transform() client.etl().init_code(code=transform, etl_id="etl-code")
We initialize another ETL w/ spec:
from aistore.client.etl_templates import MD5 template = MD5.format(communication_type="hpush") client.etl().init_spec(template=template, etl_id="etl-spec")
Refer to more templates here.
Once initialized, we can verify the ETLs are running with method
# List all running ETLs client.etl().list()
We can get an object with the ETL transformations applied:
# Get object w/ ETL code transformation obj1 = client.bucket("bucket-demo").object("object-demo").get(etl_id="etl-code").read_all() # Get object w/ ETL spec transformation obj2 = client.bucket("bucket-demo").object("object-demo").get(etl_id="etl-spec").read_all()
We can stop the ETLs if desired with method
# Stop ETL client.etl().stop(etl_id="etl-code") client.etl().stop(etl_id="etl-spec") # Verify ETLs are not actively running client.etl().list()
Stopped ETLs can be resumed with method
# Stop ETLs client.etl().start(etl_id="etl-code") client.etl().start(etl_id="etl-spec") # Verify ETLs are not actively running client.etl().list()
Finally, once finished with the ETLs, we cleanup by stopping the ETLs with
stop and substenquently deleting the ETLs with
# Stop ETLs client.etl().stop(etl_id="etl-code") client.etl().stop(etl_id="etl-spec") # Delete ETLs client.etl().delete(etl_id="etl-code") client.etl().delete(etl_id="etl-spec")
Deleting an ETL deletes all pods created by Kuberenetes for the ETL. Consequently, deleted ETLs cannot be started again.
For an interactive demo, refer here.
For more in-depth examples, please see SDK tutorial (Jupyter Notebook).
For more information on API usage, refer to the API reference documentation.
You can list and load data from AIS buckets (buckets that are not 3rd party backend-based) and remote cloud buckets (3rd party backend-based cloud buckets) in PyTorch using AISFileLister and AISFileLoader.
AISFileLoader are now available as a part of official pytorch/data project.
from torchdata.datapipes.iter import AISFileLister, AISFileLoader # provide list of prefixes to load and list data from ais_prefixes = ['gcp://bucket-name/folder/', 'aws:bucket-name/folder/', 'ais://bucket-name/folder/', ...] # List all files for these prefixes using AISFileLister dp_ais_urls = AISFileLister(url='localhost:8080', source_datapipe=ais_prefixes) # print(list(dp_ais_urls)) # Load files using AISFileLoader dp_files = AISFileLoader(url='localhost:8080', source_datapipe=dp_ais_urls) for url, file in dp_files: pass
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.