Skip to main content

Bodo Platform SDK 2.2.0

Project description

Bodo Platform SDK

Bodo Platform SDK is a Python library that provides a simple way to interact with the Bodo Platform API. It allows you to create, manage, and monitor resources such as clusters, jobs, and workspaces.

Updates:

  • NEW:
    • Implementation of cursor.describe
  • FIXES:
    • Update of cursor interface, added missing properties/methods:
      • rownumber
      • query_id
      • close

Getting Started

Installation

pip install bodosdk

Creating workspace client

First you need to access your workspace in https://platform.bodo.ai/ and create an API Token in the Bodo Platform for Bodo SDK authentication.

Navigate to API Tokens in the Admin Console to generate a token. Copy and save the token's Client ID and Secret Key and use them to define a client (BodoClient) that can interact with the Bodo Platform.

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient(
    client_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    secret_key="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
)

Alternatively, set BODO_CLIENT_ID and BODO_SECRET_KEY environment variables to avoid requiring keys:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()

To get workspace data, you can access the workspace_data attribute of the client:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
print(my_workspace.workspace_data)

Additional Configuration Options for BodoClient

  • print_logs: defaults to False. All API requests and responses are printed to the console if set to True.
from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient(print_logs=True)

Create first cluster

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
    name='My first cluster',
    instance_type='c5.large',
    workers_quantity=1
)

Above example creates a simple one node cluster, with latest bodo version available and returns cluster object. Platform will create cluster in your workspace.

Waiting for status

To wait till cluster will be ready for interaction you can call method wait_for_status

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
    name='My first cluster',
    instance_type='c5.large',
    workers_quantity=1
)
my_cluster.wait_for_status(['RUNNING'])

This method will wait until any of provided statuses will occur or FAILED status will be set in case of some failure. Check your workspace on https://platform.bodo.ai/ and you will see your cluster, you can use Notebook to connect with it.

Updating Cluster

Now let's update our cluster, on RUNNING cluster you can update name, description, auto_pause, auto_stop and workers_quantity(this will trigger scaling) only:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
    name='My first cluster',
    instance_type='c5.large',
    workers_quantity=1
)
my_cluster.wait_for_status(['RUNNING'])
my_cluster.update(
    description='My description',
    name="My updated cluster",
    auto_pause=15,
    auto_stop=30
)

All other modifcations like instance_type, bodo_version etc need STOPPED cluster

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get("cluster_id")
if my_cluster.status != 'STOPPED':
    my_cluster.stop(wait=True)
my_cluster.update(instance_type='c5.2xlarge', workers_quantity=2)

Create First Job

On running cluster you can schedule a job in very simple way: First on https://platform.bodo.ai navigate to notebook in your workspace and create following test.py file in your main directory:

import pandas as pd
import numpy as np
import bodo
import time

NUM_GROUPS = 30
NUM_ROWS = 20_000_000

df = pd.DataFrame({
    "A": np.arange(NUM_ROWS) % NUM_GROUPS,
    "B": np.arange(NUM_ROWS)
})
df.to_parquet("my_data.pq")
time.sleep(1) # wait till file will be available on all nodes
@bodo.jit(cache=True)
def computation():
    t1 = time.time()
    df = pd.read_parquet("my_data.pq")
    df1 = df[df.B > 4].A.sum()
    print("Execution time:", time.time() - t1)
    return df1

result = computation()
print(result)

then define job on cluster through SDK, wait till SUCCEEDED and check logs

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get("cluster_id")
my_job = my_cluster.run_job(
    code_type='PYTHON',
    source={'type': 'WORKSPACE', 'path': '/'},
    exec_file='test.py'
)
print(my_job.wait_for_status(['SUCCEEDED']).get_stdout())

You can use almost same confiuration to run SQL file, all you need is to define your test.sql file and Catalog https://platform.bodo.ai:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get("cluster_id")
my_job = my_cluster.run_job(
    code_type='SQL',
    source={'type': 'WORKSPACE', 'path': '/'},
    exec_file='test.sql',
    catalog="MyCatalog"
)
print(my_job.wait_for_status(['SUCCEEDED']).get_stdout())

Cluster List and executing jobs on it's elements

Now let's try to run same job on different clusters:

from bodosdk import BodoWorkspaceClient
import random

my_workspace = BodoWorkspaceClient()

random_val = random.random() # just to avoid conflicts on name
clusters_conf = [('c5.large', 8), ('c5.xlarge',4), ('c5.2xlarge',2)]
for  i, conf in enumerate(clusters_conf):
    my_workspace.ClusterClient.create(
        name=f'Test {i}',
        instance_type=conf[0],
        workers_quantity=conf[1],
        custom_tags={'test_tag': f'perf_test{random_val}'} # let's add tag to easy filter our clusters
    )
# get list by tag
clusters = my_workspace.ClusterClient.list(filters={
    'tags': {'test_tag': f'perf_test{random_val}'}
})
# run same job 3 times, once per each cluster
jobs = clusters.run_job(
    code_type='PYTHON',
    source={'type': 'WORKSPACE', 'path': '/'},
    exec_file='test.py'
)
#wait for jobs to finish and print results
for job in jobs.wait_for_status(['SUCCEEDED']):
    print(job.name, job.cluster.name)
    print(job.get_stdout())
#remove our clusters
jobs.clusters.delete() # or clusters.delete()

Execute SQL query

You can also execute SQL queries by passing just query text like following:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_sql_job = my_workspace.JobClient.run_sql_query(sql_query="SELECT 1", catalog="MyCatalog", cluster={
    "name": 'Temporary cluster',
    "instance_type": 'c5.large',
    "workers_quantity": 1
})
print(my_sql_job.wait_for_status(['SUCCEEDED']).get_stdout())

In above case, when you provide cluster configuration but not existing cluster it will be terminated as soon as SQL job will finish.

If you want to execute sql job on existing cluster just use run_sql_query on cluster:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
    name='My cluster',
    instance_type='c5.large',
    workers_quantity=1
)
my_sql_job = my_cluster.run_sql_query(sql_query="SELECT 1", catalog="MyCatalog")
print(my_sql_job.wait_for_status(['SUCCEEDED']).get_stdout())

Connector

You can also execute SQL queries using connector for cluster:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
    name='My cluster',
    instance_type='c5.large',
    workers_quantity=1
)
connection = my_cluster.connect('MyCatalog') # or connection = my_workspace.ClusterClient.connect('MyCatalog', 'cluster_id')
print(connection.cursor().execute("SELECT 1").fetchone())
my_cluster.delete()

Job Templates

Against defining jobs from scratch you can create a template for your jobs, and then easily run them, e.g.:

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
tpl = my_workspace.JobTemplateClient.create(
    name='My template',
    cluster={
        'instance_type': 'c5.xlarge',
        'workers_quantity': 1
    },
    code_type="SQL",
    catalog="MyCatalog",
    exec_text="SELECT 1"
)
job1 = tpl.run() # you can simply run it
job2 = tpl.run(exec_text="SELECT 2") # or run it with overriding template values
job3 = tpl.run(cluster={'instance_type': 'c5.large'}) # you can override even part of cluster configuration

jobs = my_workspace.JobClient.list(filters={'template_ids':[tpl.id]}) # you can filter jobs by it's template_id
for job in jobs.wait_for_status(['SUCCEEDED']):
    print(job.name, job.cluster.instance_type, job.get_stdout())

You can also run your template on specific cluster e.g:

from bodosdk import BodoWorkspaceClient
from bodosdk.models import JobTemplateFilter

my_workspace = BodoWorkspaceClient()
tpls = my_workspace.JobTemplateClient.list(filters=JobTemplateFilter(names=['My template']))
my_cluster = my_workspace.ClusterClient.create(
    name='My cluster',
    instance_type='c5.large',
    workers_quantity=1
)
print(my_cluster.run_job(template_id=tpls[0].id).wait_for_status(['SUCCEEDED']).get_stdout())
my_cluster.delete()

Scheduled Jobs

Rather than having to setup your own infrastructure or scheduler, you can create a scheduled job using the Bodo Platform. You will have to create a Job Template first.

from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
tpl = my_workspace.JobTemplateClient.create(
    name='My template',
    cluster={
        'instance_type': 'c5.xlarge',
        'workers_quantity': 1
    },
    code_type="SQL",
    catalog="MyCatalog",
    exec_text="SELECT 1"
)
scheduled_job = my_workspace.CronJobClient.create(
    name="Cron job",
    schedule="0 * * * *",
    timezone="Etc/GMT",
    max_concurrent_runs=1,
    job_template=tpl,
)

Job Runs using the job template that you provided will be automatically created and execute based on the provided schedule.

Statuses

Each resource, Cluster, Job or Workspace has own set of statuses which are following:

Cluster:

  • NEW
  • INPROGRESS
  • PAUSING
  • PAUSED
  • STOPPING
  • STOPPED
  • INITIALIZING
  • RUNNING
  • FAILED
  • TERMINATED

Job:

  • PENDING
  • RUNNING
  • SUCCEEDED
  • FAILED
  • CANCELLED
  • CANCELLING
  • TIMEOUT

Workspace:

  • NEW
  • INPROGRESS
  • READY
  • FAILED
  • TERMINATING
  • TERMINATED

Organization client and workspaces:

To manage workspaces you need different keys (generated for organization) and differenct SDK client, for start let's list all our workspaces:

from bodosdk import BodoOrganizationClient

my_org = BodoOrganizationClient(
    client_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    secret_key="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
) # or BodoOrganizationClient() if `BODO_ORG_CLIENT_ID` and `BODO_ORG_SECRET_KEY` are exported
for w in my_org.list_workspaces():
    print(w.name)

You can filter workspaces providing valid filters:

from bodosdk import BodoOrganizationClient
from bodosdk.models import WorkspaceFilter
my_org = BodoOrganizationClient()

for w in my_org.list_workspaces(filters=WorkspaceFilter(statuses=['READY'])):
    print(w.name)

You can provide filters as 'WorkspaceFilter' imported from bodosdk.models or as a dictionary:

from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()

for w in my_org.list_workspaces(filters={"statuses": ['READY']}):
    print(w.name)

Create new Workspace

from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()
my_workspace = my_org.create_workspace(
    name="SDK test",
    region='us-east-2',
    cloud_config_id="a0d1242c-3091-42de-94d9-548e2ae33b73",
    storage_endpoint_enabled=True
).wait_for_status(['READY'])
assert my_workspace.id == my_org.list_workspaces(filters={"names": ['SDK test'], "statuses": ['READY']})[0].id
my_workspace.delete() # remove workspace at the end

Upgrade workspace infra

In some cases when you have workspace existing for a long time you may want to re-run terraform to apply fresh changes to workspace infrastructure. You can do it following way:

from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()
my_org.list_workspaces(filters={'ids': ['workspace_to_update1_id', 'workspace_to_update2_id']}).update_infra()

Advanced

In this section we will present more examples of bodosdk usages.

Workspace created in existing VPC

There is possibility to create workspace on existing infrastructure. The only requirement is that VPC need access to Internet, either NAT or IGW. It's needed to allow clusters to authorize in external auth service.

from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()
my_workspace = my_org.create_workspace(
    cloud_config_id="cloudConfigId",
    name="My workspace",
    region="us-east-1",
    storage_endpoint_enabled=True,
    vpc_id="existing-vpc-id",
    private_subnets_ids=['subnet1', 'subnet2'],
    public_subnets_ids=['subnet3']
)
my_workspace.wait_for_status(['READY'])

Spot instances, auto AZ

You can create Cluster using spot instances, to reduce cost of usage, downside is that you cannot PAUSE this kind of cluster, and from time to time cluster may be unavailable (when spot instance is released).

Auto AZ is mechanism which retries cluster creation in another AZ, when in current AZ there is no enough instances of desired type.

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
    name='Spot cluster',
    instance_type='c5.large',
    workers_quantity=1,
    use_spot_instance=True,
    auto_az=True,
)

Accelerated networking

Accelerated networking is enabled by for instances that supporting.

You can get list of all supported instances using ClusterClient, it returns list of InstanceType objects. Field accelerated_networking informs about network acceleration.

from bodosdk import BodoWorkspaceClient

my_workspace = BodoWorkspaceClient()

accelerated_networking_instances = [x for x in my_workspace.ClusterClient.get_instances() if x.accelerated_networking]

my_cluster = my_workspace.ClusterClient.create(
    name='Spot cluster',
    instance_type=accelerated_networking_instances[0].name,
    workers_quantity=1,
)

Preparing clusters for future use:

In Bodo, Cluster may be in two states responsible for suspended status: PAUSED and STOPPED. Spot clusters cannot be PAUSED. There are 3 differences between those states: cost, start up time, error rate.

Costs

PAUSED > STOPPED - In PAUSED state we are paying for disks while in STOPPED we don't.

Start up time

STOPPED > PAUSED - Bringing back machines in PAUSED state is much faster, as those machines are already defined in cloud

Error rate

PAUSED > STOPPED - By error rate we mean situation when number of available instances of descired types is lower than number of requested workers. As in PAUSED state, instance entities are already defined, and we request for reesources at once it's more likely to happen than in STOPPED state, where asg is maintaining instance creation and it waits for available resources.

Prepare clusters for further use and make them PAUSED

from bodosdk import BodoWorkspaceClient
from bodosdk.models import ClusterFilter
my_workspace = BodoWorkspaceClient()

clusters_conf = {
    'Team A': {
        'instance_type': 'c5.2xlarge',
        'workers': 4,
    },
    'Team b': {
        'instance_type': 'c5.xlarge',
        'workers': 2,
    },
    'Team C': {
        'instance_type': 'c5.16xlarge',
        'workers': 2,
    }
}
for owner, conf in clusters_conf.items():
    my_workspace.ClusterClient.create(
        name = f"{owner} Cluster",
        instance_type=conf['instance_type'],
        workers_quantity=conf['workers'],
        custom_tags={'owner': owner, 'purpose': 'test'}
    )

my_workspace.ClusterClient.list(
    filters=ClusterFilter(tags={'purpose': 'test'})
).wait_for_status(
    ['RUNNING', 'INITIALIZING']
).pause().wait_for_status(['PAUSED'])

Use another cluster as a template for cluster definition in job

Let's imagine that you have a cluster (in any state) and you wan't to run job on the same specification, but you don't want to use previously defined cluster. You can do following

from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get('existing_cluster')
cluster_conf = my_cluster.dict()
del cluster_conf['uuid']
my_sql_job = my_workspace.JobClient.run_sql_query(sql_query="SELECT 1", catalog="MyCatalog", cluster=cluster_conf)

In that case job will create a new cluster with provided configuration, executes and after job is finished removes cluster.

#INDEX

bodosdk package contents

class bodosdk.clients.cluster.ClusterClient(workspace_client: IBodoWorkspaceClient)

Bases: IClusterClient

A client for managing cluster operations in a Bodo workspace.

_deprecated_methods

A dictionary of deprecated methods.

  • Type: Dict

_images

A list of available Bodo images.

  • Type: List[IBodoImage]

  • Parameters: workspace_client (IBodoWorkspaceClient) – The workspace client used for operations.

property Cluster : Cluster

Provides access to cluster operations.

  • Returns: An instance of Cluster for cluster operations.
  • Return type: Cluster

property ClusterList : ClusterList

Provides access to listing clusters.

  • Returns: An instance of ClusterListAPIModel for listing clusters.
  • Return type: ClusterList

connect(catalog: str, cluster_id: str)

Connect to a specific catalog and cluster.

  • Parameters:
    • catalog – The name the catalog to connect to.
    • cluster_id – The UUID of the cluster to connect to.
  • Returns: An instance of Connection representing the connection to the catalog and cluster.

create(name: str, instance_type: str = None, workers_quantity: int = None, description: str | None = None, bodo_version: str | None = None, auto_stop: int | None = None, auto_pause: int | None = None, auto_upgrade: bool | None = None, auto_az: bool | None = None, use_spot_instance: bool | None = None, aws_deployment_subnet_id: str | None = None, availability_zone: str | None = None, instance_role: InstanceRole | Dict | None = None, custom_tags: Dict | None = None)

Creates a new cluster with the specified configuration.

  • Parameters:
    • name (str) – The name of the cluster.
    • instance_type (str , optional) – The type of instance to use for the cluster nodes.
    • workers_quantity (int , optional) – The number of worker nodes in the cluster.
    • description (str , optional) – A description of the cluster.
    • bodo_version (str , optional) – The Bodo version to use for the cluster. If not provided, the latest version is used.
    • auto_stop (int , optional) – The auto-stop time in minutes for the cluster.
    • auto_pause (int , optional) – The auto-pause time in minutes for the cluster.
    • auto_upgrade (bool , optional) – Should the cluster be automatically upgraded to the latest Bodo version on restart.
    • auto_az (bool , optional) – Whether to automatically select the availability zone.
    • use_spot_instance (bool , optional) – Whether to use spot instances for the cluster.
    • aws_deployment_subnet_id (str , optional) – The AWS deployment subnet ID.
    • availability_zone (str , optional) – The availability zone for the cluster.
    • instance_role (InstanceRole | Dict , optional) – The instance role or a custom role configuration.
    • custom_tags (Dict , optional) – Custom tags to assign to the cluster resources.
  • Returns: The created Cluster object.
  • Return type: Cluster

get(id: str)

Retrieves a cluster by its ID.

  • Parameters: id (str) – The ID of the cluster to retrieve.
  • Returns: The retrieved Cluster object.
  • Return type: Cluster

get_bodo_versions()

Retrieves a list of available Bodo versions.

  • Returns: A list of available Bodo versions.
  • Return type: List[str]

get_images()

Retrieves a list of available images.

  • Returns: A list of image IDs available for clusters.
  • Return type: List[str]

get_instance_types()

Retrieves list of all supported instance types

  • Returns: List[InstanceType]

property latest_bodo_version : str

Retrieves the latest Bodo version available.

  • Returns: The latest Bodo version.
  • Return type: str

list(filters: Dict | ClusterFilter | None = None, order: Dict | None = None)

Lists clusters based on the provided filters and order.

  • Parameters:
    • filters (Dict | ClusterFilter , optional) – The filters to apply to the cluster listing.
    • order (Dict , optional) – The order in which to list the clusters.
  • Returns: A list of clusters matching the criteria.
  • Return type: ClusterList

pause(id: str, wait=False)

Pauses the specified cluster.

  • Parameters: id (str) – The ID of the cluster to pause.
  • Returns: The paused Cluster object.
  • Return type: Cluster

remove(id: str, wait=False)

Removes the specified cluster.

  • Parameters: id (str) – The ID of the cluster to remove.
  • Returns: None

resume(id: str, wait=False)

Resumes the specified paused cluster.

  • Parameters: id (str) – The ID of the cluster to resume.
  • Returns: The resumed Cluster object.
  • Return type: Cluster

scale(id: str, new_size: int)

Scales the specified cluster to the new size.

  • Parameters:
    • id (str) – The ID of the cluster to scale.
    • new_size (int) – The new size for the cluster in terms of the number of worker nodes.
  • Returns: The scaled Cluster object.
  • Return type: Cluster

start(id: str, wait=False)

Starts the specified stopped cluster.

  • Parameters: id (str) – The ID of the cluster to start.
  • Returns: The started Cluster object.
  • Return type: Cluster

stop(id: str, wait=False)

Stops the specified cluster.

  • Parameters: id (str) – The ID of the cluster to stop.
  • Returns: The stopped Cluster object.
  • Return type: Cluster

update(id: str, name: str | None = None, description: str | None = None, auto_stop: int | None = None, auto_pause: int | None = None, auto_upgrade: bool | None = True, workers_quantity: int | None = None, instance_role: InstanceRole | Dict | None = None, instance_type: str | None = None, bodo_version: str | None = None, auto_az: bool | None = None, availability_zone: str | None = None, custom_tags: Dict | None = None)

Updates the specified cluster with the given configuration.

  • Parameters:
    • id (str) – The ID of the cluster to update.
    • name (str , optional) – The new name for the cluster.
    • description (str , optional) – A new description for the cluster.
    • auto_stop (int , optional) – The new auto-stop time in minutes.
    • auto_pause (int , optional) – The new auto-pause time in minutes.
    • auto_upgrade (bool , optional) – if cluster should be updated after each restart.
    • workers_quantity (int , optional) – The new number of worker nodes.
    • instance_role (InstanceRole | Dict , optional) – The new instance role or custom role configuration.
    • instance_type (str , optional) – The new instance type for the cluster nodes.
    • bodo_version (str , optional) – The new Bodo version for the cluster.
    • auto_az (bool , optional) – Whether to automatically select the availability zone.
    • availability_zone (str , optional) – The new availability zone for the cluster.
    • custom_tags (Dict , optional) – New custom tags for the cluster resources.
  • Returns: The updated Cluster object.
  • Return type: Cluster

wait_for_status(id: str, statuses: List, timeout: int | None = 300, tick: int | None = 30)

Waits for the specified cluster to reach any of the given statuses within the timeout period.

  • Parameters:
    • id (str) – The ID of the cluster to monitor.
    • statuses (List) – The list of statuses to wait for.
    • timeout (int , optional) – The timeout period in seconds.
    • tick (int , optional) – The interval in seconds between status checks.
  • Returns: The Cluster object if it reaches the desired status within the timeout period.
  • Return type: Cluster

class bodosdk.clients.instance_role.InstanceRoleClient(workspace_client: IBodoWorkspaceClient)

Bases: IInstanceRoleClient

property InstanceRole : InstanceRole

Get the InstanceRole object.

  • Returns: An instance of InstanceRole.
  • Return type: InstanceRole

property InstanceRoleList : InstanceRoleList

Get the InstanceRoleList object.

create(role_arn: str, description: str, name: str = None)

Create a new instance role.

  • Parameters:
    • role_arn (str) – The ARN of the role.
    • description (str) – A description of the role.
    • name (str , optional) – The name of the role. Defaults to None.
  • Returns: The created instance role after saving.
  • Return type: InstanceRole

delete(id: str)

Delete an instance role by its ID.

  • Parameters: id (str) – The UUID of the instance role to delete.

get(id: str)

Get an instance role by its ID.

  • Parameters: id (str) – The UUID of the instance role.
  • Returns: An instance of InstanceRole.
  • Return type: InstanceRole

list(filters: Dict | InstanceRoleFilter | None = None, order: Dict | None = None)

List all instance roles with optional filters and order.

  • Parameters:
    • filters (Optional *[*Union *[*Dict , InstanceRoleFilter ] ]) – A dictionary or InstanceRoleFilter
    • filters. (object to apply) –
    • order (Optional *[*Dict ]) – A dictionary to specify the order of the results.
  • Returns: An instance of InstanceRoleList containing the filtered and ordered instance roles.
  • Return type: InstanceRoleList

class bodosdk.clients.job.JobClient(workspace_client: IBodoWorkspaceClient)

Bases: IJobClient

property JobRun : JobRun

Get the JobRun object.

  • Returns: An instance of JobRun.
  • Return type: JobRun

property JobRunList : JobRunList

Get the JobRunList object.

  • Returns: An instance of JobRunList.
  • Return type: JobRunList

cancel_job(id: str)

Cancel job by id.

  • Parameters: id (str) – Job id.
  • Returns: Job object.
  • Return type: JobRun

cancel_jobs(filters: Dict | JobFilter | None = None)

Cancel jobs with the given filters.

  • Parameters: filters (Optional *[*Union *[*Dict , JobFilter ] ]) – Filters to apply on the list.
  • Returns: JobRunList object.
  • Return type: JobRunList

get(id: str)

Get job by id.

  • Parameters: id (str) – Job id.
  • Returns: Job object.
  • Return type: JobRun

list(filters: Dict | JobFilter | None = None, order: Dict | None = None)

List jobs with the given filters.

  • Parameters:
    • filters (Optional *[*Union *[*Dict , JobFilter ] ]) – Filters to apply on the list.
    • order (Optional *[*Dict ]) – Order to apply on the list.
  • Returns: JobRunList object.
  • Return type: JobRunList

run(template_id: str = None, cluster: dict | ICluster = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, exec_text: str = None, args: Sequence[Any] | Dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, name: str = None, catalog: str = None, store_result: bool = None)

Run a job with the given parameters.

  • Parameters:
    • template_id (str) – Job template id.
    • cluster (Union *[*dict , ICluster ]) – Cluster object or cluster config.
    • code_type (str) – Code type.
    • source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ]) – Source object.
    • exec_file (str) – Exec file path.
    • exec_text (str) – Exec text.
    • args (Union *[*Sequence *[*Any ] , Dict , str ]) – Arguments.
    • env_vars (dict) – Environment variables.
    • timeout (int) – Timeout.
    • num_retries (int) – Number of retries.
    • delay_between_retries (int) – Delay between retries.
    • retry_on_timeout (bool) – Retry on timeout.
    • name (str) – Job name.
    • catalog (str) – Catalog, applicable only for SQL jobs.
    • store_result (bool) – Whether to store the result.
  • Returns: Job object.
  • Return type: JobRun

run_sql_query(template_id: str = None, catalog: str = None, sql_query: str = None, cluster: dict | ICluster = None, name: str = None, args: Sequence[Any] | Dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, store_result: bool = True)

Run a SQL job with the given parameters.

  • Parameters:
    • template_id (str) – Job template id.
    • catalog (str) – Catalog.
    • sql_query (str) – SQL query.
    • cluster (Union *[*dict , ICluster ]) – Cluster object or cluster config.
    • name (str) – Job name.
    • args (Union *[*Sequence *[*Any ] , Dict ]) – Arguments.
    • timeout (int) – Timeout.
    • num_retries (int) – Number of retries.
    • delay_between_retries (int) – Delay between retries.
    • retry_on_timeout (bool) – Retry on timeout.
    • store_result (bool) – Whether to store the result.
  • Returns: Job object.
  • Return type: JobRun

wait_for_status(id: str, statuses: List[str], timeout: int = 3600, tick: int = 30)

Wait for job to reach one of the given statuses.

  • Parameters:
    • id (str) – Job id.
    • statuses (List *[*str ]) – List of statuses to wait for.
    • timeout (int) – Timeout.
    • tick (int) – Tick.
  • Returns: Job object.
  • Return type: JobRun

class bodosdk.clients.job_tpl.JobTemplateClient(workspace_client: IBodoWorkspaceClient)

Bases: IJobTemplateClient

property JobTemplate : JobTemplate

property JobTemplateList : JobTemplateList

create(name: str = None, description: str = None, cluster: dict | ICluster = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, exec_text: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, catalog: str = None, store_result: bool = False)

Create a new job template with the given parameters.

  • Parameters:
    • name (str) – Name of the job template.
    • description (str) – Description of the job template.
    • cluster (Union *[*dict , ICluster ]) – Cluster object or cluster config.
    • code_type (str) – Code type.
    • source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ]) – Source object.
    • exec_file (str) – Exec file path.
    • exec_text (str) – Exec text.
    • args (Union *[*dict , str ]) – Arguments.
    • env_vars (dict) – Environment variables.
    • timeout (int) – Timeout.
    • num_retries (int) – Number of retries.
    • delay_between_retries (int) – Delay between retries.
    • retry_on_timeout (bool) – Retry on timeout.
    • catalog (str) – Catalog, applicable only for SQL code type.
    • store_result (bool) – Whether to store the result.

get(id: str)

Get job template by id.

  • Parameters: id (str) – Job template id.
  • Returns: Job template object.
  • Return type: JobTemplate

list(filters: Dict = None)

List job templates with the given filters.

  • Parameters: filters (Dict) – Filters to apply on the list.
  • Returns: JobTemplateList object.
  • Return type: JobTemplateList

remove(id: str)

Delete job template by id.

  • Parameters: id (str) – Job template id.

class bodosdk.clients.cron_job.CronJobClient(workspace_client: IBodoWorkspaceClient)

_deprecated_methods: Dict = {}

A dictionary of deprecated methods.

Type: Dict

property CronJob: CronJob

Provides access to CronJob operations.

  • Return: A CronJob instance for cron job operations.
  • Return Type: CronJob

property CronJobList: CronJobList

Provides access to a CronJob list.

  • Return: A CronJobList instance for listing cron jobs.
  • Return Type: CronJobList

list(order: Dict | None = None)

List cron jobs based on the provided order.

  • Parameters:
    • order ( Dict , optional ) – The order in which to list the cron jobs.
  • Return: A list of cron jobs in the provided ordering.
  • Return Type: CronJobList

get(id: str)

Get cron job by id.

  • Parameters:
    • id ( str ) – The ID of the cron job to retrieve.
  • Return: The retrieved CronJob object.
  • Return Type: CronJob

create(name: str = None, description: str = None, schedule: str = None, timezone: str = "Etc/GMT", max_concurrent_runs: int = 1, job_template: IJobTemplate = None, cluster: Union[dict, ICluster] = None, pause_cluster_when_finished: bool = None)

Create a new cron job with the given parameters.

  • Parameters:
    • name (str) : Name of the cron job.
    • description (str) : Description of the cron job.
    • schedule (str) : Schedule the cron job should follow in UNIX cron syntax.
    • timezone (str) : Timezone the cron job schedule should be based off of.
    • max_concurrent_runs (int) : Maximum number of cron job runs that can happen concurrently (max is 10).
    • job_template (IJobTemplate) : Job Template the cron job is derived from.
    • cluster (Union[dict, ICluster]) : Cluster object or cluster config.
    • pause_cluster_when_finished (bool) : Whether to pause the cluster when the cron job run has finished. Note this is only used when passing a non-job-dedicated cluster.
  • Return: The created CronJob object.
  • Return Type: CronJob

remove(id: str)

Delete cron job by id.

  • Parameters:
    • id ( str ) – The ID of the cron job to remove.
  • Return: The status code of the response
  • Return Type: int

class bodosdk.clients.organization.BodoOrganizationClient(client_id=None, secret_key=None, api_url='https://api.bodo.ai/api', auth_url='https://auth.bodo.ai', print_logs=False)

Bases: IBodoOrganizationClient

property CloudConfig : CloudConfig

property CloudConfigList : CloudConfigList

property Workspace : Workspace

property WorkspaceList : WorkspaceList

create_aws_cloud_config(name: str, tf_backend_region: str, role_arn: str | None = None, tf_bucket_name: str | None = None, tf_dynamo_db_table_name: str | None = None, account_id: str | None = None, access_key_id: str | None = None, secret_access_key: str | None = None, custom_tags: dict | None = None)

Create a new AWS cloud config in the organization with the given parameters.

  • Parameters:
    • name (str) – Name of the cloud config.
    • tf_backend_region (str) – Terraform backend region.
    • role_arn (Optional *[*str ]) – Role ARN.
    • tf_bucket_name (Optional *[*str ]) – Terraform bucket name.
    • tf_dynamo_db_table_name (Optional *[*str ]) – Terraform dynamo db table name.
    • account_id (Optional *[*str ]) – Account id.
    • access_key_id (Optional *[*str ]) – Access key id.
    • secret_access_key (Optional *[*str ]) – Secret access key.
    • custom_tags (Optional *[*dict ]) – Custom tags for the cloud config.
  • Returns: CloudConfig object.
  • Return type: CloudConfig

create_azure_cloud_config(name: str, tf_backend_region: str, tenant_id: str, subscription_id: str, resource_group: str, custom_tags: dict | None = None)

Create a new Azure cloud config in the organization with the given parameters.

  • Parameters:
    • name (str) – Name of the cloud config.
    • tf_backend_region (str) – Terraform backend region.
    • tenant_id (str) – Tenant id.
    • subscription_id (str) – Subscription id.
    • resource_group (str) – Resource group.
    • custom_tags (Optional *[*dict ]) – Custom tags for the cloud config.
  • Returns: CloudConfig object.
  • Return type: CloudConfig

create_workspace(name: str, region: str, storage_endpoint_enabled: bool, cloud_config_id: str = None, vpc_id: str | None = None, public_subnets_ids: List[str] | None = None, private_subnets_ids: List[str] | None = None, custom_tags: dict | None = None)

Create a new workspace in the organization with the given parameters.

  • Parameters:
    • name (str) – Name of the workspace.
    • region (str) – Region of the workspace.
    • storage_endpoint_enabled (bool) – Enable storage endpoint for the workspace.
    • cloud_config_id (str) – Cloud config id for the workspace.
    • vpc_id (Optional *[*str ]) – VPC id for the workspace.
    • public_subnets_ids (Optional *(*List *[*str ] )) – List of public subnet ids.
    • private_subnets_ids (Optional *(*List *[*str ] )) – List of private subnet ids.
    • custom_tags (Optional *(*dict )) – Custom tags for the workspace.
  • Returns: Workspace object.
  • Return type: Workspace

delete_workspace(id)

Delete workspace by id.

  • Parameters: id (str) – Workspace id.
  • Returns: Workspace object.
  • Return type: Workspace

get_cloud_config(id)

Get cloud config by id.

  • Parameters: id (str) – Cloud config id.
  • Returns: CloudConfig object.
  • Return type: CloudConfig

get_workspace(id)

Get workspace by id.

  • Parameters: id (str) – Workspace id.
  • Returns: Workspace object.
  • Return type: Workspace

list_cloud_configs(filters: dict | None = None)

List cloud configs in the organization.

  • Parameters: filters (Optional *[*Union *[*dict ] ]) – Filters to apply on the list.
  • Returns: CloudConfigList object.
  • Return type: CloudConfigList

list_workspaces(filters: WorkspaceFilter | dict | None = None)

List workspaces in the organization.

  • Parameters: filters (Optional *[*Union [WorkspaceFilter , dict ] ]) – Filters to apply on the list.
  • Returns: WorkspaceList object.
  • Return type: WorkspaceList

class bodosdk.clients.workspace.BodoWorkspaceClient(client_id=None, secret_key=None, api_url='https://api.bodo.ai/api', auth_url='https://auth.bodo.ai', print_logs=False)

Bases: IBodoWorkspaceClient

ClusterClient : IClusterClient

JobClient : IJobClient

JobTemplateClient : IJobTemplateClient

property workspace_data : IWorkspace

Get workspace data.

  • Returns: Workspace object.
  • Return type: Workspace

property workspace_id : str

Get workspace id.

  • Returns: Workspace id.
  • Return type: str

class bodosdk.models.catalog.Catalog(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, description: str | None = None, catalogType: str | None = None, details: SnowflakeDetails | dict | None = None)

Bases: SDKBaseModel, ICatalog

delete()

class bodosdk.models.catalog.CatalogFilter(*, names: List[str] | None = None, ids: List[str] | None = None)

Bases: SDKBaseModel, ICatalogFilter

ids : List[str] | None

names : List[str] | None

class bodosdk.models.catalog.CatalogList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: dict | None = None, filters: dict | CatalogFilter | None = None)

Bases: ICatalogList, SDKBaseModel

class Config

Bases: object

Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/

allow_population_by_field_name = True

extra = 'forbid'

delete()

filters : dict | CatalogFilter | None

order : dict | None

page : int | None

page_size : int | None

total : int | None

class bodosdk.models.catalog.SnowflakeDetails(*, port: int | None = None, db_schema: str | None = None, database: str | None = None, userRole: str | None = None, username: str | None = None, warehouse: str | None = None, accountName: str | None = None, password: str | None = None)

Bases: SDKBaseModel

account_name : str | None

database : str | None

db_schema : str | None

password : str | None

port : int | None

user_role : str | None

username : str | None

warehouse : str | None

class bodosdk.models.cloud_config.AwsProviderData(*, provider: str = 'AWS', roleArn: str | None = None, tfBucketName: str | None = None, tfDynamoDbTableName: str | None = None, tfBackendRegion: str | None = None, externalId: str | None = None, accountId: str | None = None, accessKeyId: str | None = None, secretAccessKey: str | None = None)

Bases: SDKBaseModel, IAwsProviderData

access_key_id : str | None

account_id : str | None

external_id : str | None

provider : str

role_arn : str | None

secret_access_key : str | None

tf_backend_region : str | None

tf_bucket_name : str | None

tf_dynamo_db_table_name : str | None

class bodosdk.models.cloud_config.AzureProviderData(*, provider: str = 'AZURE', tfBackendRegion: str | None = None, resourceGroup: str | None = None, subscriptionId: str | None = None, tenantId: str | None = None, tfStorageAccountName: str | None = None, applicationId: str | None = None)

Bases: SDKBaseModel, IAzureProviderData

application_id : str | None

provider : str

resource_group : str | None

subscription_id : str | None

tenant_id : str | None

tf_backend_region : str | None

tf_storage_account_name : str | None

class bodosdk.models.cloud_config.CloudConfig(org_client: IBodoOrganizationClient = None, *, name: str | None = None, status: str | None = None, organizationUUID: str | None = None, customTags: dict | None = None, uuid: str | UUID | None = None, provider_data: AwsProviderData | AzureProviderData | None = None)

Bases: SDKBaseModel, ICloudConfig

custom_tags : dict | None

delete()

property id

name : str | None

organization_uuid : str | None

provider_data : AwsProviderData | AzureProviderData | None

status : str | None

uuid : str | UUID | None

class bodosdk.models.cloud_config.CloudConfigFilter(*, ids: List[str] | None = None, providers: List[str] | None = None, statuses: List[str] | None = None)

Bases: SDKBaseModel

ids : List[str] | None

providers : List[str] | None

statuses : List[str] | None

class bodosdk.models.cloud_config.CloudConfigList(org_client: IBodoOrganizationClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: dict | None = None, filters: CloudConfigFilter | None = None)

Bases: ICloudConfigList, SDKBaseModel

class Config

Bases: object

Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/

allow_population_by_field_name = True

extra = 'forbid'

delete()

filters : CloudConfigFilter | None

order : dict | None

page : int | None

page_size : int | None

total : int | None

class bodosdk.models.cluster.BodoImage(*, image_id: str, bodo_version: str)

Bases: SDKBaseModel

Represents an image configuration for Bodo, encapsulating the image ID and the specific Bodo version.

This class is a data model that holds information about a Bodo environment image.

image_id

The unique identifier for the Bodo image.

  • Type: str

bodo_version

The version of Bodo used in the image.

  • Type: str

bodo_version : str

image_id : str

class bodosdk.models.cluster.Cluster(workspace_client: IBodoWorkspaceClient = None, *, name: str | None = None, uuid: str | None = None, status: str | None = None, description: str | None = None, instanceType: str | None = None, workersQuantity: int | None = None, autoStop: int | None = None, autoPause: int | None = None, autoUpgrade: bool | None = None, bodoVersion: str | None = None, coresPerWorker: int | None = None, acceleratedNetworking: bool | None = None, createdAt: datetime | None = None, updatedAt: datetime | None = None, isJobDedicated: bool | None = None, autoAZ: bool | None = None, useSpotInstance: bool | None = None, lastKnownActivity: datetime | None = None, inStateSince: datetime | None = None, clusterAgentVersion: str | None = None, clusterInitStatus: str | None = None, clusterHealthStatus: str | None = None, primaryAgentIP: str | None = None, awsDeploymentSubnetId: str | None = None, nodeMetadata: List[NodeMetadata] | None = None, availabilityZone: str | None = None, instanceRole: InstanceRole | None = None, workspace: dict | None = None, autoscalingIdentifier: str | None = None, lastAsgActivityId: str | None = None, nodesIp: List[str] | None = None, customTags: Dict | None = None)

Bases: SDKBaseModel, ICluster

Represents a cluster in the SDK model, encapsulating various properties and operations related to a compute cluster.

name

The name of the cluster.

  • Type: Optional[str]

uuid

The unique identifier of the cluster.

  • Type: Optional[str]

status

The current status of the cluster (e.g., ‘RUNNING’, ‘STOPPED’).

  • Type: Optional[str]

description

A description of the cluster.

  • Type: Optional[str]

instance_type

The type of instances used in the cluster (e.g., ‘c5.large’).

  • Type: Optional[str]

workers_quantity

The number of worker nodes in the cluster.

  • Type: Optional[int]

auto_stop

The auto-stop configuration in minutes. The cluster automatically stops when idle for this duration.

  • Type: Optional[int]

auto_pause

The auto-pause configuration in minutes. The cluster automatically pauses when idle for this duration.

  • Type: Optional[int]

auto_upgrade

Should the cluster be upgraded on restart. The cluster is automatically upgraded to the latest Bodo version on restart when True.

  • Type: Optional[bool]

bodo_version

The version of Bodo being used in the cluster.

  • Type: Optional[str]

cores_per_worker

The number of CPU cores per worker node.

  • Type: Optional[int]

accelerated_networking

Whether accelerated networking is enabled.

  • Type: Optional[bool]

created_at

The creation timestamp of the cluster.

  • Type: Optional[str]

updated_at

The last update timestamp of the cluster.

  • Type: Optional[str]

is_job_dedicated

Whether the cluster is dedicated to a specific job.

  • Type: Optional[bool]

auto_az

Whether automatic availability zone selection is enabled.

  • Type: Optional[bool]

use_spot_instance

Whether spot instances are used for the cluster.

  • Type: Optional[bool]

last_known_activity

The last known activity timestamp of the cluster.

  • Type: Optional[str]

in_state_since

The timestamp since the cluster has been in its current state.

  • Type: Optional[str]

cluster_agent_version

The version of the cluster agent.

  • Type: Optional[str]

cluster_init_status

The initialization status of the cluster.

  • Type: Optional[str]

cluster_health_status

The health status of the cluster.

  • Type: Optional[str]

primary_agent_ip

The IP address of the primary agent in the cluster.

  • Type: Optional[str]

aws_deployment_subnet_id

The subnet ID used for deploying AWS resources.

  • Type: Optional[str]

node_metadata

Metadata information for each node in the cluster.

availability_zone

The AWS availability zone in which the cluster is located.

  • Type: Optional[str]

instance_role

The IAM role used by instances in the cluster.

workspace

A dictionary containing workspace-related information for the cluster.

  • Type: Optional[dict]

autoscaling_identifier

The identifier for autoscaling configuration.

  • Type: Optional[str]

last_asg_activity_id

The identifier of the last activity in the autoscaling group.

  • Type: Optional[str]

nodes_ip

A list of IP addresses for the nodes in the cluster.

  • Type: Optional[List[str]]

accelerated_networking : bool | None

auto_az : bool | None

auto_pause : int | None

auto_stop : int | None

auto_upgrade : bool | None

autoscaling_identifier : str | None

availability_zone : str | None

aws_deployment_subnet_id : str | None

bodo_version : str | None

cancel_jobs()

Cancels all jobs associated with the cluster.

  • Returns: The Cluster instance.

cluster_agent_version : str | None

cluster_health_status : str | None

cluster_init_status : str | None

connect(catalog: str)

Establishes a connection to the specified catalog from Cluster.

This method is responsible for creating and returning a new Connection instance based on the provided catalog.

Parameters: catalog (str): The name of the catalog to which the connection should be established.

Returns: Connection: An instance of Connection initialized with the specified catalog and the current class instance.

cores_per_worker : int | None

created_at : datetime | None

custom_tags : Dict | None

delete(force: bool = False, mark_as_terminated: bool = False, wait: bool = False)

Deletes the cluster, optionally forcing removal or marking as terminated.

  • Parameters:
    • force – If True, forces the deletion of the cluster.
    • mark_as_terminated – If True, marks the cluster as terminated instead of deleting.
    • wait – If True, waits till cluster will be TERMINATED.
  • Returns: The Cluster instance, potentially updated to reflect its new state.

Handles: : ResourceNotFound: Silently if the cluster is already deleted or not found.

description : str | None

property id : str

The UUID of the cluster.

  • Returns: The UUID string of the cluster.

in_state_since : datetime | None

instance_role : InstanceRole | None

instance_type : str | None

is_job_dedicated : bool | None

last_asg_activity_id : str | None

last_known_activity : datetime | None

name : str | None

node_metadata : List[NodeMetadata] | None

nodes_ip : List[str] | None

pause(wait: bool = False)

Pauses the cluster if it is running.

  • Parameters: wait – If True, waits till cluster will be PAUSED.
  • Returns: The Cluster instance with updated status.
  • Raises: ConflictException – If the cluster cannot be paused due to its current status.

primary_agent_ip : str | None

resume(wait: bool = False)

Resumes the cluster if it was paused or stopped.

  • Parameters: wait – If True, waits till cluster will be RUNNING.
  • Returns: The Cluster instance with updated status.
  • Raises: ConflictException – If the cluster cannot be resumed due to its current status.

run_job(template_id: str = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, name: str = None, catalog: str = None, store_result: bool = False)

Submits a batch job for execution using specified configurations and resources.

This method creates and dispatches a job within a computing cluster, allowing for extensive customization of execution parameters including source data, runtime environment, and failure handling strategies.

  • Parameters:
    • template_id (str , optional) – Identifier for the job template to use.
    • code_type (str , optional) – Type of code to execute (e.g., Python, Java).
    • source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ] , optional) – Source of the code to be executed. Can be specified as a dictionary or an instance of one of the predefined source interfaces.
    • exec_file (str , optional) – Path to the executable file within the source.
    • args (Union *[*dict , str ] , optional) – Arguments to pass to the executable.
    • parameters. (Can be a string or a dictionary of) –
    • env_vars (dict , optional) – Environment variables to set for the job.
    • timeout (int , optional) – Maximum runtime (in seconds) before the job is terminated.
    • num_retries (int , optional) – Number of times to retry the job on failure.
    • delay_between_retries (int , optional) – Time to wait between retries.
    • retry_on_timeout (bool , optional) – Whether to retry the job if it times out.
    • name (str , optional) – Name of the job.
    • catalog (str , optional) – Catalog to log the job under.
    • store_result (bool , optional) – Whether to store on S3 job results or not.
  • Returns: An object representing the submitted job, capable of providing status and results.
  • Return type: IJobRun

run_sql_query(template_id: str = None, catalog: str = None, sql_query: str = None, name: str = None, args: Sequence[Any] | Dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, store_result: bool = False)

Submits an SQL query for execution on the cluster, returning a job run object.

This method handles the execution of an SQL query within a defined cluster environment. It supports customization of execution parameters such as query arguments, job name, execution timeouts, and retry strategies.

Parameters: : template_id (str, optional): Identifier for the job template to use. catalog (str, optional): Catalog under which to log the SQL job. sql_query (str, optional): The SQL query string to be executed. name (str, optional): Descriptive name for the SQL job. args (dict, optional): Dictionary of arguments that are passed to the SQL query. timeout (int, optional): Maximum allowable runtime in seconds before the job is terminated. num_retries (int, optional): Number of times the job will be retried on failure. delay_between_retries (int, optional): Interval in seconds between job retries. retry_on_timeout (bool, optional): Whether to retry the job if it times out. store_result (bool, optional): Whether to store on S3 job results or not.

Returns: : IJobRun: An object representing the status and result of the executed SQL job.

`

start(wait: bool = False)

Starts the cluster.

  • Parameters: wait – If True, waits till cluster will be RUNNING.
  • Returns: The Cluster instance with updated status.

status : str | None

stop(wait: bool = False)

Stops the cluster.

  • Parameters: wait – If True, waits till cluster will be STOPPED.
  • Returns: The Cluster instance with updated status.

update(auto_stop: int | None = None, auto_pause: int | None = None, auto_upgrade: bool | None = None, description: str | None = None, name: str | None = None, workers_quantity: int | None = None, instance_role: InstanceRole | None = None, instance_type: str | None = None, bodo_version: str | None = None, auto_az: bool | None = None, availability_zone: str | None = None, custom_tags: Dict | None = None)

Updates the cluster’s configuration with the provided values.

  • Parameters:
    • auto_stop – Optional; configures auto-stop feature.
    • auto_pause – Optional; configures auto-pause feature.
    • auto_upgrade – Optional; enables/disables auto-upgrade on restart.
    • description – Optional; updates the cluster’s description.
    • name – Optional; updates the cluster’s name.
    • workers_quantity – Optional; updates the number of workers.
    • instance_role – Optional; updates the instance role.
    • instance_type – Optional; updates the instance type.
    • bodo_version – Optional; updates the Bodo version.
    • auto_az – Optional; enables/disables automatic availability zone selection.
    • availability_zone – Optional; sets a specific availability zone.
  • Returns: The updated Cluster instance.

updated_at : datetime | None

use_spot_instance : bool | None

uuid : str | None

wait_for_status(statuses: List[str], timeout: int = 600, tick: int = 30)

Waits for the cluster to reach one of the specified states within a given timeout.

  • Parameters:
    • statuses – A list of states to wait for.
    • timeout – The maximum time to wait before raising a TimeoutException.
    • tick – The interval between checks.
  • Returns: The Cluster instance, once it has reached one of the desired states.
  • Raises: TimeoutException – If the cluster does not reach a desired state within the timeout.

workers_quantity : int | None

workspace : dict | None

class bodosdk.models.cluster.ClusterFilter(*, uuids: List[str] | None = None, clusterNames: List[str] | None = None, statues: List[str] | None = None, tags: Dict | None = None)

Bases: SDKBaseModel, IClusterFilter

Represents a filter used to select clusters based on specific criteria.

This class is used to construct filter criteria for querying clusters by their identifiers, names, statuses, or tags. It inherits from SDKBaseModel and implements the IClusterFilter interface.

ids

Optional list of cluster UUIDs. Default is an empty list.

  • Type: Optional[List[str]]

cluster_names

Optional list of cluster names to filter by. Default is an empty list.

  • Type: Optional[List[str]]

statuses

Optional list of cluster statuses to filter by. Default is an empty list.

  • Type: Optional[List[str]]

tags

Optional dictionary of tags for more fine-grained filtering.

  • Type: Optional[Dict]

Default is an empty dictionary.

Each attribute supports being set via their field name or by the specified alias in the Field definition.

class Config

Bases: object

Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/

allow_population_by_field_name = True

extra = 'forbid'

cluster_names : List[str] | None

ids : List[str] | None

statuses : List[str] | None

tags : Dict | None

class bodosdk.models.cluster.ClusterList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: ClusterFilter | dict | None = None)

Bases: IClusterList, SDKBaseModel

A model representing a list of clusters, providing pagination, filtering, and operations on clusters such as start, stop, delete, resume, and pause.

page

The current page number for pagination, starting from 0.

  • Type: Optional[int]

page_size

The number of items to be displayed per page.

  • Type: Optional[int]

total

The total number of items available across all pages.

  • Type: Optional[int]

order

Ordering information for listing clusters. Defaults to an empty dict.

  • Type: Optional[Dict]

filters

Filtering criteria to apply when fetching the cluster list.

_clusters

Internal list of cluster objects.

  • Type: List[ICluster]

_index

Internal index to track the current position when iterating through clusters.

  • Type: int

class Config

Bases: object

Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/

allow_population_by_field_name = True

extra = 'forbid'

cancel_jobs()

Cancels all jobs associated with the clusters.

  • Returns: The ClusterList instance.

delete(wait=False)

Deletes each cluster in the list, updating the internal list with the result of each delete operation. This method effectively attempts to delete all clusters and returns the updated list.

  • Returns: The current instance of ClusterList after attempting to delete : all clusters.
  • Return type: ClusterList

filters : ClusterFilter | dict | None

order : Dict | None

page : int | None

page_size : int | None

pause(wait=False)

Attempts to pause each running cluster in the list. It handles exceptions gracefully, updating the list with the status of each cluster following the operation.

  • Returns: The current instance of ClusterList after attempting to pause : all clusters.
  • Return type: ClusterList

refresh()

Refreshes the list of clusters by resetting the pagination and filter settings, then reloading the first page of clusters. This method effectively resets the ClusterList instance to its initial state, based on current filters and ordering.

  • Returns: The current instance of ClusterList after reloading the first page : of clusters.
  • Return type: ClusterList

resume(wait=False)

Attempts to resume each paused or stopped cluster in the list. It handles exceptions gracefully, ensuring the list is updated with the status of each cluster after the operation.

  • Returns: The current instance of ClusterList after attempting to resume : all clusters.
  • Return type: ClusterList

run_job(template_id: str = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, name: str = None, catalog: str = None, store_result: bool = None)

Executes a job across all clusters managed by the instance.

This method supports multiple source types and configurations for executing jobs, including retries and custom environment variables.

  • Parameters:
    • template_id (str , optional) – Identifier for the job template to be used.
    • code_type (str , optional) – The type of code to execute (e.g., Python, Java).
    • source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ] , optional) –
    • retrieved. (The source from where the job's code will be) –
    • exec_file (str , optional) – Path to the main executable file within the source.
    • args (Union *[*dict , str ] , optional) – Arguments to pass to the job. Can be a dictionary or a
    • job. (string formatted as required by the) –
    • env_vars (dict , optional) – Environment variables to set for the job execution.
    • timeout (int , optional) – Maximum time in seconds for the job to run before it is terminated.
    • num_retries (int , optional) – Number of times to retry the job on failure.
    • delay_between_retries (int , optional) – Time in seconds to wait between retries.
    • retry_on_timeout (bool , optional) – Whether to retry the job if it times out.
    • name (str , optional) – A name for the job run.
    • catalog (str , optional) – Catalog identifier to specify a data catalog for the job.
    • store_result (bool , optional) – Whether to store on S3 job results or not.
  • Returns: An object listing the UUIDs of jobs that were successfully initiated.
  • Return type: IJobRunList

Decorators: : @check_deprecation: Checks if the method or its parameters are deprecated.

run_sql_query(template_id: str = None, catalog: str = None, sql_query: str = None, name: str = None, args: Sequence[Any] | Dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, store_result: bool = None)

Executes an SQL job across all clusters managed by the instance.

This method submits an SQL query for execution, allowing for additional configurations such as retries and setting execution timeouts.

  • Parameters:
    • template_id (str , optional) – Identifier for the job template to be used.
    • catalog (str , optional) – Catalog identifier to specify a data catalog for the SQL job.
    • sql_query (str , optional) – The SQL query to execute.
    • name (str , optional) – A name for the job run.
    • args (dict , optional) – Additional arguments specific to the SQL job.
    • timeout (int , optional) – Maximum time in seconds for the job to run before it is terminated.
    • num_retries (int , optional) – Number of times to retry the job on failure.
    • delay_between_retries (int , optional) – Time in seconds to wait between retries.
    • retry_on_timeout (bool , optional) – Whether to retry the job if it times out.
    • store_result (bool , optional) – Whether to store on S3 job results or not.
  • Returns: An object listing the UUIDs of SQL jobs that were successfully initiated.
  • Return type: IJobRunList

Decorators: : @check_deprecation: Checks if the method or its parameters are deprecated.

start(wait=False)

Attempts to start each stopped or paused cluster in the list. It handles exceptions gracefully, ensuring the list reflects the status of each cluster after the operation.

  • Returns: The current instance of ClusterList after attempting to start : all clusters.
  • Return type: ClusterList

stop(wait=False)

Attempts to stop each running or starting cluster in the list. It handles exceptions gracefully, updating the list with the status of each cluster after the operation.

  • Returns: The current instance of ClusterList after attempting to stop : all clusters.
  • Return type: ClusterList

total : int | None

wait_for_status(statuses: List[str] = None, timeout: int = 600, tick: int = 60)

Waits for each cluster in the list to reach one of the specified statuses, updating the list with the results. This method polls each cluster’s status until it matches one of the desired statuses or until the timeout is reached.

  • Parameters:
    • statuses (List *[*str ] , optional) – A list of status strings to wait for.
    • timeout (int , optional) – The maximum time to wait for each cluster to reach the desired status, in seconds.
    • tick (int , optional) – The interval between status checks, in seconds.
  • Returns: The current instance of ClusterList after waiting for all clusters : to reach one of the specified statuses.
  • Return type: ClusterList

class bodosdk.models.cluster.InstanceType(*, name: str, vcpus: int, cores: int, memory: int, acceleratedNetworking: bool | None = None)

Bases: SDKBaseModel, IInstanceType

Represents the specifications for a type of computing instance.

This class defines a specific configuration of a computing instance, including its processing power and memory capabilities, as well as optional features related to networking.

name

The name or identifier of the instance type.

  • Type: str

vcpus

The number of virtual CPUs available in this instance type.

  • Type: int

cores

The number of physical cores available in this instance type.

  • Type: int

memory

The amount of RAM (in megabytes) available in this instance type.

  • Type: int

accelerated_networking

Specifies if accelerated networking is enabled.

  • Type: Optional[bool]

This is mapped to the JSON key 'acceleratedNetworking'. Defaults to None.

accelerated_networking : bool | None

cores : int

memory : int

name : str

vcpus : int

class bodosdk.models.cluster.NodeMetadata(*, privateIP: str | None = None, instanceId: str | None = None)

Bases: SDKBaseModel

instance_id : str | None

private_ip : str | None

class bodosdk.db.connection.Connection(catalog: str, cluster: ICluster, timeout=3600)

Bases: object

A connection to a catalog and cluster for executing queries.

close()

Close the connection and all associated cursors.

commit()

No-op because Bodo does not support transactions.

cursor(query_id=None)

Create a new cursor for executing queries.

  • Parameters: query_id – Optional query ID for resuming a previous query.
  • Returns: A new cursor instance.

rollback()

Rollback is not supported as transactions are not supported on Bodo.

  • Raises: NotSupportedError – Always raised as rollback is not supported.

class bodosdk.db.connection.Cursor(catalog: str, cluster: ICluster, timeout: int = 3600, query_id=None)

Bases: ICursor

A cursor for executing and fetching results from SQL queries on a cluster.

close()

Close the cursor and clean up resources.

property description

Get the description of the result set columns.

  • Returns: A list of column descriptions.

execute(query: str, args: Sequence[Any] | Dict = None)

Execute a SQL query synchronously.

  • Parameters:
    • query – The SQL query to execute.
    • args – Optional arguments for the query.
  • Returns: The cursor instance.

execute_async(query: str, args: Sequence[Any] | Dict = None)

Execute a SQL query asynchronously.

  • Parameters:
    • query – The SQL query to execute.
    • args – Optional arguments for the query.
  • Returns: The cursor instance.

fetchall()

Fetch all remaining rows of the result set.

  • Returns: A list of all remaining rows.

fetchmany(size)

Fetch the next set of rows of the result set.

  • Parameters: size – The number of rows to fetch.
  • Returns: A list of rows.

fetchone()

Fetch the next row of the result set.

  • Returns: The next row, or None if no more rows are available.

property query_id

Get the query ID.

  • Returns: The query ID.

property rowcount

Get the number of rows in the result set.

  • Returns: The number of rows in the result set.

property rownumber

Get the current row number.

  • Returns: The current row number.

setinputsizes(sizes)

Set the sizes of input parameters (not supported).

  • Parameters: sizes – Input sizes.

setoutputsize(size, column=None)

Set the size of the output (not supported).

  • Parameters:
    • size – The output size.
    • column – The column to set the size for.

class bodosdk.models.job.GitRepoSource(*, type: str = 'GIT', repoUrl: str, reference: str | None = '', username: str, token: str)

Bases: SDKBaseModel

Git repository source definition.

repo_url

Git repository URL.

  • Type: str

reference

Git reference (branch, tag, commit hash). (Default: “”)

  • Type: Optional[str]

username

Git username.

  • Type: str

token

Git token.

  • Type: str

reference : str | None

repo_url : str

token : str

type : str

username : str

class bodosdk.models.job.JobConfig(*, type: str | None = None, source: GitRepoSource | WorkspaceSource | S3Source | TextSource | None = None, sourceLocation: str | None = None, sqlQueryText: str | None = None, sqlQueryParameters: dict | None = None, args: dict | Sequence | str | None = None, retryStrategy: RetryStrategy | None = None, timeout: int | None = None, envVars: dict | None = None, catalog: str | None = None, storeResult: bool | None = False)

Bases: SDKBaseModel, IJobConfig

Configures details for executing a job, including the execution source, file location, and execution parameters.

type

The type of job: PYTHON, SQL, IPYNB.

  • Type: Optional[str]

source

The source from which the job is configured or executed.

exec_file

The location of the file to execute.

  • Type: Optional[str]

exec_text

The text containing script/code/sql to execute, valid for TextSource.

  • Type: Optional[str]

sql_query_parameters

Parameters to substitute within an SQL query.

  • Type: Optional[dict]

args

Additional arguments required for job execution. Can be a dictionary or string.

  • Type: Optional[Union[dict, str]]

retry_strategy

Configuration for the job retry strategy.

timeout

The maximum time (in seconds) allowed for the job to run before it times out.

  • Type: Optional[int]

env_vars

Environment variables to be set for the job run.

  • Type: Optional[dict]

catalog

The catalog associated with the job, if applicable.

  • Type: Optional[str]

args : dict | Sequence | str | None

catalog : str | None

env_vars : dict | None

exec_file : str | None

exec_text : str | None

retry_strategy : RetryStrategy | None

source : GitRepoSource | WorkspaceSource | S3Source | TextSource | None

sql_query_parameters : dict | None

store_result : bool | None

timeout : int | None

type : str | None

class bodosdk.models.job.JobFilter(*, ids: List[str | UUID] | None = None, template_ids: List[str | UUID] | None = None, cluster_ids: List[str | UUID] | None = None, types: List[str] | None = None, statuses: List[str] | None = None, started_at: datetime | None = None, finished_at: datetime | None = None)

Bases: SDKBaseModel

Provides filtering options for querying job-related data.

ids

A list of job IDs to filter by.

  • Type: Optional[List[Union[str, UUID]]]

template_ids

A list of job template IDs to filter by.

  • Type: Optional[List[Union[str, UUID]]]

cluster_ids

A list of cluster IDs to filter jobs by.

  • Type: Optional[List[Union[str, UUID]]]

types

A list of job types to filter by.

  • Type: Optional[List[str]]

statuses

A list of job statuses to filter by.

  • Type: Optional[List[str]]

started_at

A datetime value to filter jobs that started after it.

  • Type: Optional[datetime]

finished_at

A datetime value to filter jobs that finished before it.

  • Type: Optional[datetime]

cluster_ids : List[str | UUID] | None

finished_at : datetime | None

ids : List[str | UUID] | None

started_at : datetime | None

statuses : List[str] | None

template_ids : List[str | UUID] | None

types : List[str] | None

class bodosdk.models.job.JobRun(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, type: str | None = 'BATCH', submittedAt: datetime | None = None, finishedAt: datetime | None = None, lastHealthCheck: datetime | None = None, lastKnownActivity: datetime | None = None, status: str | None = None, reason: str | None = None, numRetriesUsed: int | None = 0, tag: List | None = None, clusterUUID: str | None = None, cluster: 'Cluster' | None, clusterConfig: 'Cluster' | None = None, config: JobConfig | None = None, jobTemplateUUID: str | None = None, submitter: str | None = None, stats: dict | None = None)

Bases: SDKBaseModel, IJobRun

Details a specific instance of a job run, including status and configuration details.

uuid

Unique identifier for the job run.

  • Type: Optional[str]

name

Name of the job run.

  • Type: Optional[str]

type

Type of job run, defaults to ‘BATCH’ if not specified.

  • Type: Optional[str]

submitted_at

Timestamp when the job was submitted.

  • Type: Optional[datetime]

finished_at

Timestamp when the job finished running.

  • Type: Optional[datetime]

last_health_check

Timestamp of the last health check performed.

  • Type: Optional[datetime]

last_known_activity

Timestamp of the last known activity of this job.

  • Type: Optional[datetime]

status

Current status of the job run.

  • Type: Optional[str]

reason

Reason for the job’s current status, particularly if there’s an error or failure.

  • Type: Optional[str]

num_retries_used

The number of retries that have been used for this job run. Defaults to 0.

  • Type: Optional[int]

tags

Tags associated with the job run. Default is an empty list.

  • Type: Optional[List]

cluster_uuid

UUID of the cluster on which the job is running.

  • Type: Optional[str]

cluster

The cluster object associated with the job run.

cluster_config

Configuration of the cluster used for this job run.

config

Job configuration details used for this run.

job_template_id

UUID of the template from which this job was created.

  • Type: Optional[str]

submitter

The identifier of the user who submitted the job.

  • Type: Optional[str]

stats

Statistical data about the job run.

  • Type: Optional[dict]

cancel()

Cancels the job associated with this instance and updates its state based on the response from the job API.

This method sends a cancellation request for the job identified by its UUID to the job API, handles the response to update the job’s attributes, and resets its modified state.

  • Returns: The job instance, updated with the latest state after the cancellation attempt.
  • Return type: JobRun
  • Raises: APIException – If the cancellation request fails or returns an error response.

cluster : 'Cluster' | None

cluster_config : 'Cluster' | None

cluster_id : str | None

config : JobConfig | None

finished_at : datetime | None

get_logs_urls()

get_result_urls()

get_stderr()

get_stdout()

property id : str

job_template_id : str | None

last_health_check : datetime | None

last_known_activity : datetime | None

name : str | None

num_retries_used : int | None

reason : str | None

stats : dict | None

status : str | None

submitted_at : datetime | None

submitter : str | None

tags : List | None

type : str | None

uuid : str | None

wait_for_status(statuses: List[str], timeout: int = 3600, tick: int = 30)

Waits for the job to reach one of the specified statuses, polling at regular intervals.

This method checks the current status of the job at regular intervals defined by the tick parameter. If the job reaches one of the specified statuses before the timeout, it returns the job instance. If the timeout is exceeded, it raises a TimeoutException.

  • Parameters:
    • statuses (list or tuple) – A list or tuple of statuses that the job is expected to reach.
    • timeout (int , optional) – The maximum time in seconds to wait for the job to reach one of the specified statuses. Defaults to 600 seconds.
    • tick (int , optional) – The time interval in seconds between status checks. Defaults to 30 seconds.
  • Returns: The job instance if one of the specified statuses is reached within the timeout period.
  • Return type: JobRun
  • Raises: TimeoutException – If the job does not reach one of the specified statuses within the specified timeout period.

class bodosdk.models.job.JobRunList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: JobFilter | None = None)

Bases: IJobRunList, SDKBaseModel

Represents a paginated list of job runs, including metadata and filtering capabilities.

page

The current page number in the paginated list. Defaults to 0.

  • Type: Optional[int]

page_size

The number of job runs to display per page. Defaults to 10.

  • Type: Optional[int]

total

The total number of job runs available.

  • Type: Optional[int]

order

A dictionary specifying the order in which job runs are sorted.

  • Type: Optional[Dict]

filters

A JobFilter object used to apply filtering on the list of job runs.

cancel()

Cancels all jobs in the list.

This method iterates through each job in the current list (represented by instances of this class), and calls the cancel method on each job to attempt their cancellation. After attempting to cancel all jobs, it returns the updated list of jobs.

  • Returns: The list instance, potentially with updated states of the individual jobs : if the cancellation requests were successful.
  • Return type: JobRunList

property clusters

filters : JobFilter | None

order : Dict | None

page : int | None

page_size : int | None

refresh()

Refreshes the list of jobs by clearing the current elements and reloading the next page of job data.

This method resets the internal state of the list, including the pagination index and the elements themselves. It then loads the next page of data from the underlying data source to repopulate the list. The list is temporarily set to mutable during this process to allow updates, and then set back to immutable once the refresh is complete.

  • Returns: The refreshed job run list instance, with elements reloaded from the next available page.
  • Return type: JobRunList

total : int | None

wait_for_status(statuses: List[str], timeout: int = 3600, tick: int = 30)

Waits for each job in the list to reach one of the specified statuses, polling each at regular intervals.

This method iterates over each job within the list, checking at intervals (defined by tick) to see if the job has reached one of the desired statuses specified in statuses. If a job reaches the desired status within the timeout period, the method continues to the next job. If the timeout is reached without the job reaching the desired status, a TimeoutException is raised.

  • Parameters:
    • statuses (list or tuple) – A list or tuple of statuses that each job is expected to reach.
    • timeout (int , optional) – The maximum time in seconds to wait for each job to reach one of the specified statuses. Defaults to 600 seconds.
    • tick (int , optional) – The time interval in seconds between status checks for each job. Defaults to 30 seconds.
  • Returns: The job run list instance, after attempting to wait for all jobs to reach the desired statuses.
  • Return type: JobRunList
  • Raises: TimeoutException – If any job does not reach one of the specified statuses within the specified timeout period, including details of the job’s UUID and its current status.

class bodosdk.models.job.JobRunLogsResponse(*, stderrUrl: str = None, stdoutUrl: str = None, expirationDate: str = None)

Bases: SDKBaseModel, IJobRunLogsResponse

Represents the response object containing URLs for accessing logs related to a job run.

stderr_location_url

The URL to access the standard error logs of the job run.

  • Type: str

stdout_location_url

The URL to access the standard output logs of the job run.

  • Type: str

expiration_date

The date when the log URLs will expire and no longer be accessible.

  • Type: str

expiration_date : str

stderr_location_url : str

stdout_location_url : str

class bodosdk.models.job.JobTemplate(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, jobRuns: List[JobRun] = None, description: str | None = None, createdBy: str | None = None, config: JobConfig | None = None, clusterConfig: 'Cluster' | None = None)

Bases: SDKBaseModel, IJobTemplate

Represents a template for creating and managing jobs within an SDK environment, encapsulating common job configurations.

uuid

The unique identifier for the job template.

  • Type: Optional[str]

name

The name of the job template.

  • Type: Optional[str]

job_runs

A list of job runs associated with this template. Default is an empty list if none are specified.

description

A brief description of the job template.

  • Type: Optional[str]

created_by

The identifier of the user who created the job template.

  • Type: Optional[str]

config

The job configuration specifics for this template.

cluster_config

Configuration details of the cluster on which the jobs will run. Default is None.

cluster_config : Cluster | None

config : JobConfig | None

created_by : str | None

delete()

Deletes the job template.

description : str | None

property id

job_runs : List[JobRun]

name : str | None

run(name: str = None, cluster: dict | ICluster = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, exec_text: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, catalog: str = None, store_result: bool = False)

Runs a job using the template configuration.

  • Parameters:
    • name (str , optional) – The name of the job run.
    • cluster (Union *[*dict , ICluster ] , optional) – The cluster configuration or instance to run the job on.
    • code_type (str , optional) – The type of code to execute.
    • source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ] , optional) – The source configuration for the job.
    • exec_file (str , optional) – The file to execute.
    • exec_text (str , optional) – The text containing the script/code/sql to execute.
    • args (Union *[*dict , str ] , optional) – Arguments required for job execution.
    • env_vars (dict , optional) – Environment variables for the job run.
    • timeout (int , optional) – The maximum time (in seconds) allowed for the job to run.
    • num_retries (int , optional) – The number of retries allowed for the job.
    • delay_between_retries (int , optional) – The delay between retries in minutes.
    • retry_on_timeout (bool , optional) – Whether to retry the job on timeout.
    • catalog (str , optional) – The catalog associated with the job.
    • store_result (bool , optional) – Whether to store the result of the job run.
  • Returns: The job run instance.
  • Return type: IJobRun

uuid : str | None

class bodosdk.models.job.JobTemplateFilter(*, ids: List[str] | None = None, names: List[str] | None = None, tags: dict | None = None)

Bases: SDKBaseModel, IJobTemplateFilter

Class representing filters for JobTemplateList.

ids

Returns list matching given ids.

  • Type: List[str] | None

names

Returns list matching given names.

  • Type: List[str] | None

tags

Returns list matching given tags.

  • Type: dict | None

ids : List[str] | None

names : List[str] | None

tags : dict | None

class bodosdk.models.job.JobTemplateList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: JobTemplateFilter | None = None)

Bases: IJobTemplateList, SDKBaseModel

Represents a list of JobTemplates, providing pagination and filtering capabilities.

page

The current page number in the list of Job Templates. Defaults to 0.

  • Type: Optional[int]

page_size

The number of Job Templates to display per page. Defaults to 10.

  • Type: Optional[int]

total

The total number of Job Templates available.

  • Type: Optional[int]

order

A dictionary specifying the order in which Job Templates are sorted.

  • Type: Optional[Dict]

filters

A filter object used to filter the Job Templates listed.

delete()

Deletes all job definitions in the list.

This method iterates over each job definition contained within the list and calls its individual delete method. This action attempts to remove each job definition from the underlying storage or management system.

filters : JobTemplateFilter | None

order : Dict | None

page : int | None

page_size : int | None

refresh()

Refreshes the list of job templates by clearing the current elements and reloading the next page of job data.

This method resets the internal state of the list, including the pagination index and the elements themselves. It then loads the next page of data from the underlying data source to repopulate the list. The list is temporarily set to mutable during this process to allow updates, and then set back to immutable once the refresh is complete.

  • Returns: The refreshed job template list instance, with elements reloaded from : the next available page.
  • Return type: JobTemplateList

run(instance_type: str = None, workers_quantity: int = None, bodo_version: str = None)

Executes all job definitions in the list with specified configurations and aggregates their run IDs.

This method iterates over each job definition in the list and invokes the run method on each, passing the specified configuration parameters such as instance type, the quantity of workers, and the Bodo version. It collects the IDs of the resulting job runs and returns a consolidated list of these job runs.

  • Parameters:
    • instance_type (str , optional) – The type of instance to use for the jobs. This may specify the hardware specifications.
    • workers_quantity (int , optional) – The number of worker instances to deploy for the jobs.
    • bodo_version (str , optional) – The specific version of Bodo to use for executing the jobs.
  • Returns: An interface to the list of job runs created by executing the job definitions, : allowing further operations like monitoring and management.
  • Return type: IJobRunList

total : int | None

class bodosdk.models.CronJob(workspace_client: IBodoWorkspaceClient, uuid: str | None = None, name: str | None = None, description: str | None = None, created_by: str | None = None, schedule: str | None = None, timezone: str | None = None, last_run_date: date | None = None, next_run_date: date | None = None, max_concurrent_runs: int | None = None, config: JobConfig | None = None, cluster_config: Cluster | None = None, cluster_id: str | None = None, job_template_id: str | None = None, job_runs: List[JobRun] = None)

Bases: SDKBaseModel, ICronJob

uuid: str | None

The unique identifier of the cron job.

  • Type: Optional[str]

name: str | None

The name of the cron job.

  • Type: Optional[str]

description: str | None

The description of the cron job.

  • Type: Optional[str]

created_by: str | None

The email of the person who created the cron job.

  • Type: Optional[str]

schedule: str | None

When this cron job should schedule batch jobs.

  • Type: Optional[str]

timezone: str | None

The timezone the cron job schedule should follow.

  • Type: Optional[str]

last_run_date: date | None

The time and date of the most recently scheduled batch job run from this cron job.

  • Type: Optional[date]

next_run_date: date | None

The time and date of when this cron jobs’s next batch job run should be scheduled.

  • Type: Optional[date]

max_concurrent_runs: int | None

The limit on the number of concurrently running batch job runs this cron job can have simultaneously.

  • Type: Optional[int]

config: JobConfig | None

Additional job configurations that this cron job’s batch job should be created with.

  • Type: Optional[JobConfig]

cluster_config: ClusterConfig | None

The job-dedicated cluster config this cron job should create batch job runs with.

  • Type: Optional[Cluster]

cluster_id: str | None

The existing cluster this cron job should execute batch jobs on.

  • Type: Optional[str]

job_template_id: str | None

The job template this cron job should create batch jobs from.

  • Type: Optional[str]

job_runs: List[JobRun] | None

The list of batch job runs stemming from this cron job

  • Type: List[JobRun]

property id: str

Provides the unique identifier for this cron job.

  • Return: The cron job’s unique identifier.
  • Return Type: str

run()

Manually create and execute a batch job run from this cron job.

  • Return: The created job run.
  • Return Type: JobRun

delete()

Delete this cron job.

  • Return: The status code of the response
  • Return Type: int

deactivate()

Deactivate this cron job.

  • Return: The status code of the response
  • Return Type: int

reactivate()

Reactivate this cron job.

  • Return: The status code of the response
  • Return Type: int

history()

Retrieve the batch job runs stemming from this cron job.

  • Return: A list of job runs created from this cron job.
  • Return Type: JobRunList

class bodosdk.clients.cron_job.CronJobList(workspace_client: IBodoWorkspaceClient, page: int | None = 0, page_size: int | None = 10, total: int | None = None, order: Dict | None = {})

Bases: SDKBaseModel, ICronJobList

page: int | None

The current page number in the list of Cron Jobs. Defaults to 0.

  • Type: Optional[int]

page_size: int | None

The number of Cron Jobs to display per page. Defaults to 10.

  • Type: Optional[int]

total: int | None

The total number of Cron Jobs available.

  • Type: Optional[int]

order: Dict | None

A dictionary specifying the order in which Cron Jobs are sorted.

  • Type: Optional[Dict]

refresh()

Refreshes the list of cron jobs by clearing the current elements and reloading the next page of cron job data.

This method resets the internal state of the list, including the pagination index and the elements themselves. It then loads the next page of data from the underlying data source to repopulate the list. The list is temporarily set to mutable during this process to allow updates, and then set back to immutable once the refresh is complete.

  • Returns: The refreshed cron job list instance, with elements reloaded from the next available page.
  • Return Type: CronJobList

class bodosdk.models.job.RetryStrategy(*, numRetries: int = 0, delayBetweenRetries: int = 1, retryOnTimeout: bool = False)

Bases: SDKBaseModel

A Pydantic model that defines the retry strategy for a job or a task. It specifies how many retries should be attempted, the delay between retries, and whether to retry on timeout.

num_retries

The number of times to retry the job after a failure. Defaults to 0, meaning no retries by default.

  • Type: int

delay_between_retries

The delay between consecutive retries, expressed in minutes. Defaults to 1 minute.

  • Type: int

retry_on_timeout

A flag to determine if the job should be retried upon timing out. Defaults to False, indicating no retry on timeout.

  • Type: bool

delay_between_retries : int

num_retries : int

retry_on_timeout : bool

class bodosdk.models.job.S3Source(*, type: str = 'S3', bucketPath: str, bucketRegion: str)

Bases: SDKBaseModel

S3 source definition.

bucket_path

S3 bucket path.

  • Type: str

bucket_region

S3 bucket region.

  • Type: str

bucket_path : str

bucket_region : str

type : str

class bodosdk.models.job.TextSource(*, type: str = 'SQL')

Bases: SDKBaseModel

Represents a specific type of source where the job configuration can originate from a text source.

type

Specifies the type of source.

  • Type: str

type : str

class bodosdk.models.job.WorkspaceSource(*, type: str = 'WORKSPACE', path: str)

Bases: SDKBaseModel

Workspace source definition.

path

Workspace path.

  • Type: str

path : str

type : str

class bodosdk.models.common.AWSNetworkData(*, region: str | None = None, storageEndpoint: bool | None = None, vpcId: str | None = None, publicSubnetsIds: List[str] | None = None, privateSubnetsIds: List[str] | None = None, policyArns: List[str] | None = None)

Bases: NetworkData

Extends the NetworkData class to include specific properties for AWS networking.

vpc_id

The ID of the AWS Virtual Private Cloud (VPC) where workspace should be created.

  • Type: Optional[str]

public_subnets_ids

List of IDs for the public subnets within the AWS VPC.

  • Type: Optional[List[str]]

private_subnets_ids

List of IDs for the private subnets within the AWS VPC.

  • Type: Optional[List[str]]

policies_arn

List of AWS Resource Names (ARNs) for the policies applied to the network.

  • Type: Optional[List[str]]

policies_arn : List[str] | None

private_subnets_ids : List[str] | None

public_subnets_ids : List[str] | None

vpc_id : str | None

class bodosdk.models.common.NetworkData(*, region: str | None = None, storageEndpoint: bool | None = None)

Bases: SDKBaseModel

A base model for network-related data within an SDK context.

region

The geographic region where the network is located.

  • Type: Optional[str]

storage_endpoint

Indicates whether a storage endpoint is enabled.

  • Type: Optional[bool]

region : str | None

storage_endpoint : bool | None

class bodosdk.models.instance_role.InstanceRole(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, description: str | None = None, roleArn: str | None = None, status: str | None = None)

Bases: SDKBaseModel, IInstanceRole

delete()

Removes instance role from workspace

  • Returns: None

description : str | None

property id

name : str | None

role_arn : str | None

status : str | None

uuid : str | None

class bodosdk.models.instance_role.InstanceRoleFilter(*, uuids: List[str] | None = None, names: List[str] | None = None, roleArns: List[str] | None = None)

Bases: SDKBaseModel, IInstanceRoleFilter

Class representing filters for InstanceRoleList

ids

returns list matching given ids

  • Type: List[str] | None

names

returns list matching given names

  • Type: List[str] | None

role_arns

returns list matching giver arns

  • Type: List[str] | None

class Config

Bases: object

Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/

allow_population_by_field_name = True

extra = 'forbid'

ids : List[str] | None

names : List[str] | None

role_arns : List[str] | None

class bodosdk.models.instance_role.InstanceRoleList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: InstanceRoleFilter | None = None)

Bases: IInstanceRoleList, SDKBaseModel

Represents a list of instance roles within an SDK context, providing pagination and filtering capabilities.

page

The current page number in the list of instance roles. Defaults to 0.

  • Type: Optional[int]

page_size

The number of instance roles to display per page. Defaults to 10.

  • Type: Optional[int]

total

The total number of instance roles available.

  • Type: Optional[int]

order

A dictionary specifying the order in which instance roles are sorted.

  • Type: Optional[Dict]

filters

A filter object used to filter the instance roles listed.

class Config

Bases: object

Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/

allow_population_by_field_name = True

extra = 'forbid'

delete()

filters : InstanceRoleFilter | None

order : Dict | None

page : int | None

page_size : int | None

total : int | None

class bodosdk.models.workspace.Workspace(org_client: IBodoOrganizationClient = None, *, name: str | None = None, uuid: str | UUID | None = None, status: str | None = None, organizationUUID: str | UUID | None = None, networkData: NetworkData | AWSNetworkData | None = None, createdBy: str | None = None, notebookAutoDeployEnabled: bool | None = None, assignedAt: datetime | None = None, customTags: Dict[str, Any] | None = None, jupyterLastActivity: datetime | None = None, jupyterIsActive: bool | None = False, cloudConfig: CloudConfig | None = None)

Bases: SDKBaseModel, IWorkspace

assigned_at : datetime | None

cloud_config : CloudConfig | None

created_by : str | None

custom_tags : Dict[str, Any] | None

delete()

Deletes the workspace from the API based on its UUID and updates the instance’s properties with the response from the deletion API call.

  • Returns: The Workspace instance after deletion, updated with the API’s response data.

property id

jupyter_is_active : bool | None

jupyter_last_activity : datetime | None

name : str | None

network_data : NetworkData | AWSNetworkData | None

notebook_auto_deploy_enabled : bool | None

organization_uuid : str | UUID | None

status : str | None

update_infra()

uuid : str | UUID | None

wait_for_status(statuses, timeout=600, tick=30)

Waits for the workspace to reach one of the specified states within a given timeout.

  • Parameters:
    • statuses – A list of states to wait for.
    • timeout – The maximum time to wait before raising a TimeoutException.
    • tick – The interval between checks.
  • Returns: The workspace instance, once it has reached one of the desired states.
  • Raises: TimeoutException – If the workspace does not reach a desired state within the timeout.

class bodosdk.models.workspace.WorkspaceFilter(*, uuids: List[str] | None = None, names: List[str] | None = None, statuses: List[str] | None = None, organizationUUIDs: List[str] | None = None)

Bases: SDKBaseModel, IWorkspaceFilter

class Config

Bases: object

allow_population_by_field_name = True

extra = 'forbid'

ids : List[str] | None

names : List[str] | None

organization_uuids : List[str] | None

statuses : List[str] | None

class bodosdk.models.workspace.WorkspaceList(org_client: IBodoOrganizationClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: WorkspaceFilter | None = None)

Bases: IWorkspaceList, SDKBaseModel

class Config

Bases: object

allow_population_by_field_name = True

extra = 'forbid'

delete()

Deletes the workspaces present on the list

  • Returns: WorkspaceListAPIModel

filters : WorkspaceFilter | None

order : Dict | None

page : int | None

page_size : int | None

total : int | None

update_infra()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bodosdk-2.2.0.tar.gz (125.4 kB view details)

Uploaded Source

Built Distribution

bodosdk-2.2.0-py3-none-any.whl (106.3 kB view details)

Uploaded Python 3

File details

Details for the file bodosdk-2.2.0.tar.gz.

File metadata

  • Download URL: bodosdk-2.2.0.tar.gz
  • Upload date:
  • Size: 125.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bodosdk-2.2.0.tar.gz
Algorithm Hash digest
SHA256 6ec829ead30a7718d6ef33a2cbc1156379a6e81c791b44b9a9644209d53cea7b
MD5 44529c4b6620ba2126c6c64febb2ab80
BLAKE2b-256 db672c26742e89ae4385e472282dc8877c7a69cda70c803cfbba87b6f96231c8

See more details on using hashes here.

File details

Details for the file bodosdk-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: bodosdk-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 106.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bodosdk-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ae6e8ff41a39af72e57becff53ef445779802d6d52b7ba967f0d075970246c8
MD5 4f290878aec6169977bb8c8f7a296e7a
BLAKE2b-256 9cd184c723f7eac77c2bf1f8e10ef773a3d4ce0832d933e1f5445a409a30bd8d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page