Bodo Platform SDK 2.2.0
Project description
Bodo Platform SDK
Bodo Platform SDK is a Python library that provides a simple way to interact with the Bodo Platform API. It allows you to create, manage, and monitor resources such as clusters, jobs, and workspaces.
Updates:
- NEW:
- Implementation of cursor.describe
- FIXES:
- Update of cursor interface, added missing properties/methods:
- rownumber
- query_id
- close
- Update of cursor interface, added missing properties/methods:
Getting Started
Installation
pip install bodosdk
Creating workspace client
First you need to access your workspace in https://platform.bodo.ai/
and create an API Token in the Bodo Platform
for
Bodo SDK authentication.
Navigate to API Tokens in the Admin Console to generate a token.
Copy and save the token's Client ID and Secret Key and use them to define a client (BodoClient
) that can interact
with the Bodo Platform.
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient(
client_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
secret_key="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
)
Alternatively, set BODO_CLIENT_ID
and BODO_SECRET_KEY
environment variables
to avoid requiring keys:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
To get workspace data, you can access the workspace_data
attribute of the client:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
print(my_workspace.workspace_data)
Additional Configuration Options for BodoClient
print_logs
: defaults to False. All API requests and responses are printed to the console if set to True.
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient(print_logs=True)
Create first cluster
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
name='My first cluster',
instance_type='c5.large',
workers_quantity=1
)
Above example creates a simple one node cluster, with latest bodo version available and returns cluster object. Platform will create cluster in your workspace.
Waiting for status
To wait till cluster will be ready for interaction you can call
method wait_for_status
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
name='My first cluster',
instance_type='c5.large',
workers_quantity=1
)
my_cluster.wait_for_status(['RUNNING'])
This method will wait until any of provided statuses will occur or FAILED
status will be set in case of some failure.
Check your workspace on https://platform.bodo.ai/
and you will see your cluster, you can use Notebook to connect with
it.
Updating Cluster
Now let's update our cluster, on RUNNING
cluster you can update name
, description
, auto_pause
, auto_stop
and workers_quantity
(this will trigger scaling) only:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
name='My first cluster',
instance_type='c5.large',
workers_quantity=1
)
my_cluster.wait_for_status(['RUNNING'])
my_cluster.update(
description='My description',
name="My updated cluster",
auto_pause=15,
auto_stop=30
)
All other modifcations like instance_type
, bodo_version
etc need STOPPED
cluster
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get("cluster_id")
if my_cluster.status != 'STOPPED':
my_cluster.stop(wait=True)
my_cluster.update(instance_type='c5.2xlarge', workers_quantity=2)
Create First Job
On running cluster you can schedule a job in very simple way:
First on https://platform.bodo.ai
navigate to notebook in your workspace and
create following test.py
file in your main directory:
import pandas as pd
import numpy as np
import bodo
import time
NUM_GROUPS = 30
NUM_ROWS = 20_000_000
df = pd.DataFrame({
"A": np.arange(NUM_ROWS) % NUM_GROUPS,
"B": np.arange(NUM_ROWS)
})
df.to_parquet("my_data.pq")
time.sleep(1) # wait till file will be available on all nodes
@bodo.jit(cache=True)
def computation():
t1 = time.time()
df = pd.read_parquet("my_data.pq")
df1 = df[df.B > 4].A.sum()
print("Execution time:", time.time() - t1)
return df1
result = computation()
print(result)
then define job on cluster through SDK, wait till SUCCEEDED
and check logs
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get("cluster_id")
my_job = my_cluster.run_job(
code_type='PYTHON',
source={'type': 'WORKSPACE', 'path': '/'},
exec_file='test.py'
)
print(my_job.wait_for_status(['SUCCEEDED']).get_stdout())
You can use almost same confiuration to run SQL file, all you need is to define your test.sql
file and Catalog https://platform.bodo.ai
:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get("cluster_id")
my_job = my_cluster.run_job(
code_type='SQL',
source={'type': 'WORKSPACE', 'path': '/'},
exec_file='test.sql',
catalog="MyCatalog"
)
print(my_job.wait_for_status(['SUCCEEDED']).get_stdout())
Cluster List and executing jobs on it's elements
Now let's try to run same job on different clusters:
from bodosdk import BodoWorkspaceClient
import random
my_workspace = BodoWorkspaceClient()
random_val = random.random() # just to avoid conflicts on name
clusters_conf = [('c5.large', 8), ('c5.xlarge',4), ('c5.2xlarge',2)]
for i, conf in enumerate(clusters_conf):
my_workspace.ClusterClient.create(
name=f'Test {i}',
instance_type=conf[0],
workers_quantity=conf[1],
custom_tags={'test_tag': f'perf_test{random_val}'} # let's add tag to easy filter our clusters
)
# get list by tag
clusters = my_workspace.ClusterClient.list(filters={
'tags': {'test_tag': f'perf_test{random_val}'}
})
# run same job 3 times, once per each cluster
jobs = clusters.run_job(
code_type='PYTHON',
source={'type': 'WORKSPACE', 'path': '/'},
exec_file='test.py'
)
#wait for jobs to finish and print results
for job in jobs.wait_for_status(['SUCCEEDED']):
print(job.name, job.cluster.name)
print(job.get_stdout())
#remove our clusters
jobs.clusters.delete() # or clusters.delete()
Execute SQL query
You can also execute SQL queries by passing just query text like following:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_sql_job = my_workspace.JobClient.run_sql_query(sql_query="SELECT 1", catalog="MyCatalog", cluster={
"name": 'Temporary cluster',
"instance_type": 'c5.large',
"workers_quantity": 1
})
print(my_sql_job.wait_for_status(['SUCCEEDED']).get_stdout())
In above case, when you provide cluster configuration but not existing cluster it will be terminated as soon as SQL job will finish.
If you want to execute sql job on existing cluster just use run_sql_query
on cluster:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
name='My cluster',
instance_type='c5.large',
workers_quantity=1
)
my_sql_job = my_cluster.run_sql_query(sql_query="SELECT 1", catalog="MyCatalog")
print(my_sql_job.wait_for_status(['SUCCEEDED']).get_stdout())
Connector
You can also execute SQL queries using connector for cluster:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
name='My cluster',
instance_type='c5.large',
workers_quantity=1
)
connection = my_cluster.connect('MyCatalog') # or connection = my_workspace.ClusterClient.connect('MyCatalog', 'cluster_id')
print(connection.cursor().execute("SELECT 1").fetchone())
my_cluster.delete()
Job Templates
Against defining jobs from scratch you can create a template for your jobs, and then easily run them, e.g.:
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
tpl = my_workspace.JobTemplateClient.create(
name='My template',
cluster={
'instance_type': 'c5.xlarge',
'workers_quantity': 1
},
code_type="SQL",
catalog="MyCatalog",
exec_text="SELECT 1"
)
job1 = tpl.run() # you can simply run it
job2 = tpl.run(exec_text="SELECT 2") # or run it with overriding template values
job3 = tpl.run(cluster={'instance_type': 'c5.large'}) # you can override even part of cluster configuration
jobs = my_workspace.JobClient.list(filters={'template_ids':[tpl.id]}) # you can filter jobs by it's template_id
for job in jobs.wait_for_status(['SUCCEEDED']):
print(job.name, job.cluster.instance_type, job.get_stdout())
You can also run your template on specific cluster e.g:
from bodosdk import BodoWorkspaceClient
from bodosdk.models import JobTemplateFilter
my_workspace = BodoWorkspaceClient()
tpls = my_workspace.JobTemplateClient.list(filters=JobTemplateFilter(names=['My template']))
my_cluster = my_workspace.ClusterClient.create(
name='My cluster',
instance_type='c5.large',
workers_quantity=1
)
print(my_cluster.run_job(template_id=tpls[0].id).wait_for_status(['SUCCEEDED']).get_stdout())
my_cluster.delete()
Scheduled Jobs
Rather than having to setup your own infrastructure or scheduler, you can create a scheduled job using the Bodo Platform. You will have to create a Job Template first.
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
tpl = my_workspace.JobTemplateClient.create(
name='My template',
cluster={
'instance_type': 'c5.xlarge',
'workers_quantity': 1
},
code_type="SQL",
catalog="MyCatalog",
exec_text="SELECT 1"
)
scheduled_job = my_workspace.CronJobClient.create(
name="Cron job",
schedule="0 * * * *",
timezone="Etc/GMT",
max_concurrent_runs=1,
job_template=tpl,
)
Job Runs using the job template that you provided will be automatically created and execute based on the provided schedule.
Statuses
Each resource, Cluster, Job or Workspace has own set of statuses which are following:
Cluster:
- NEW
- INPROGRESS
- PAUSING
- PAUSED
- STOPPING
- STOPPED
- INITIALIZING
- RUNNING
- FAILED
- TERMINATED
Job:
- PENDING
- RUNNING
- SUCCEEDED
- FAILED
- CANCELLED
- CANCELLING
- TIMEOUT
Workspace:
- NEW
- INPROGRESS
- READY
- FAILED
- TERMINATING
- TERMINATED
Organization client and workspaces:
To manage workspaces you need different keys (generated for organization) and differenct SDK client, for start let's list all our workspaces:
from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient(
client_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
secret_key="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
) # or BodoOrganizationClient() if `BODO_ORG_CLIENT_ID` and `BODO_ORG_SECRET_KEY` are exported
for w in my_org.list_workspaces():
print(w.name)
You can filter workspaces providing valid filters:
from bodosdk import BodoOrganizationClient
from bodosdk.models import WorkspaceFilter
my_org = BodoOrganizationClient()
for w in my_org.list_workspaces(filters=WorkspaceFilter(statuses=['READY'])):
print(w.name)
You can provide filters as 'WorkspaceFilter' imported from bodosdk.models
or as a dictionary:
from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()
for w in my_org.list_workspaces(filters={"statuses": ['READY']}):
print(w.name)
Create new Workspace
from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()
my_workspace = my_org.create_workspace(
name="SDK test",
region='us-east-2',
cloud_config_id="a0d1242c-3091-42de-94d9-548e2ae33b73",
storage_endpoint_enabled=True
).wait_for_status(['READY'])
assert my_workspace.id == my_org.list_workspaces(filters={"names": ['SDK test'], "statuses": ['READY']})[0].id
my_workspace.delete() # remove workspace at the end
Upgrade workspace infra
In some cases when you have workspace existing for a long time you may want to re-run terraform to apply fresh changes to workspace infrastructure. You can do it following way:
from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()
my_org.list_workspaces(filters={'ids': ['workspace_to_update1_id', 'workspace_to_update2_id']}).update_infra()
Advanced
In this section we will present more examples of bodosdk usages.
Workspace created in existing VPC
There is possibility to create workspace on existing infrastructure. The only requirement is that VPC need access to Internet, either NAT or IGW. It's needed to allow clusters to authorize in external auth service.
from bodosdk import BodoOrganizationClient
my_org = BodoOrganizationClient()
my_workspace = my_org.create_workspace(
cloud_config_id="cloudConfigId",
name="My workspace",
region="us-east-1",
storage_endpoint_enabled=True,
vpc_id="existing-vpc-id",
private_subnets_ids=['subnet1', 'subnet2'],
public_subnets_ids=['subnet3']
)
my_workspace.wait_for_status(['READY'])
Spot instances, auto AZ
You can create Cluster using spot instances, to reduce cost of usage, downside is that you cannot PAUSE this kind of cluster, and from time to time cluster may be unavailable (when spot instance is released).
Auto AZ is mechanism which retries cluster creation in another AZ, when in current AZ there is no enough instances of desired type.
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.create(
name='Spot cluster',
instance_type='c5.large',
workers_quantity=1,
use_spot_instance=True,
auto_az=True,
)
Accelerated networking
Accelerated networking is enabled by for instances that supporting.
You can get list of all supported instances using ClusterClient, it returns list of
InstanceType objects. Field accelerated_networking
informs about network acceleration.
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
accelerated_networking_instances = [x for x in my_workspace.ClusterClient.get_instances() if x.accelerated_networking]
my_cluster = my_workspace.ClusterClient.create(
name='Spot cluster',
instance_type=accelerated_networking_instances[0].name,
workers_quantity=1,
)
Preparing clusters for future use:
In Bodo, Cluster may be in two states responsible for suspended status: PAUSED
and STOPPED
.
Spot clusters cannot be PAUSED
. There are 3 differences between those states: cost, start up time, error rate.
Costs
PAUSED
> STOPPED
- In PAUSED
state we are paying for disks while in STOPPED
we don't.
Start up time
STOPPED
> PAUSED
- Bringing back machines in PAUSED
state is much faster, as those machines are already
defined in cloud
Error rate
PAUSED
> STOPPED
- By error rate we mean situation when number of available instances of descired types is lower
than number of requested workers. As in PAUSED
state, instance entities are already defined,
and we request for reesources at once it's more likely to happen than in STOPPED
state,
where asg is maintaining instance creation and it waits for available resources.
Prepare clusters for further use and make them PAUSED
from bodosdk import BodoWorkspaceClient
from bodosdk.models import ClusterFilter
my_workspace = BodoWorkspaceClient()
clusters_conf = {
'Team A': {
'instance_type': 'c5.2xlarge',
'workers': 4,
},
'Team b': {
'instance_type': 'c5.xlarge',
'workers': 2,
},
'Team C': {
'instance_type': 'c5.16xlarge',
'workers': 2,
}
}
for owner, conf in clusters_conf.items():
my_workspace.ClusterClient.create(
name = f"{owner} Cluster",
instance_type=conf['instance_type'],
workers_quantity=conf['workers'],
custom_tags={'owner': owner, 'purpose': 'test'}
)
my_workspace.ClusterClient.list(
filters=ClusterFilter(tags={'purpose': 'test'})
).wait_for_status(
['RUNNING', 'INITIALIZING']
).pause().wait_for_status(['PAUSED'])
Use another cluster as a template for cluster definition in job
Let's imagine that you have a cluster (in any state) and you wan't to run job on the same specification, but you don't want to use previously defined cluster. You can do following
from bodosdk import BodoWorkspaceClient
my_workspace = BodoWorkspaceClient()
my_cluster = my_workspace.ClusterClient.get('existing_cluster')
cluster_conf = my_cluster.dict()
del cluster_conf['uuid']
my_sql_job = my_workspace.JobClient.run_sql_query(sql_query="SELECT 1", catalog="MyCatalog", cluster=cluster_conf)
In that case job will create a new cluster with provided configuration, executes and after job is finished removes cluster.
#INDEX
bodosdk package contents
class bodosdk.clients.cluster.ClusterClient(workspace_client: IBodoWorkspaceClient)
Bases: IClusterClient
A client for managing cluster operations in a Bodo workspace.
_deprecated_methods
A dictionary of deprecated methods.
- Type: Dict
_images
A list of available Bodo images.
-
Type: List[IBodoImage]
-
Parameters: workspace_client (IBodoWorkspaceClient) – The workspace client used for operations.
property Cluster : Cluster
Provides access to cluster operations.
- Returns: An instance of Cluster for cluster operations.
- Return type: Cluster
property ClusterList : ClusterList
Provides access to listing clusters.
- Returns: An instance of ClusterListAPIModel for listing clusters.
- Return type: ClusterList
connect(catalog: str, cluster_id: str)
Connect to a specific catalog and cluster.
- Parameters:
- catalog – The name the catalog to connect to.
- cluster_id – The UUID of the cluster to connect to.
- Returns: An instance of Connection representing the connection to the catalog and cluster.
create(name: str, instance_type: str = None, workers_quantity: int = None, description: str | None = None, bodo_version: str | None = None, auto_stop: int | None = None, auto_pause: int | None = None, auto_upgrade: bool | None = None, auto_az: bool | None = None, use_spot_instance: bool | None = None, aws_deployment_subnet_id: str | None = None, availability_zone: str | None = None, instance_role: InstanceRole | Dict | None = None, custom_tags: Dict | None = None)
Creates a new cluster with the specified configuration.
- Parameters:
- name (str) – The name of the cluster.
- instance_type (str , optional) – The type of instance to use for the cluster nodes.
- workers_quantity (int , optional) – The number of worker nodes in the cluster.
- description (str , optional) – A description of the cluster.
- bodo_version (str , optional) – The Bodo version to use for the cluster. If not provided, the latest version is used.
- auto_stop (int , optional) – The auto-stop time in minutes for the cluster.
- auto_pause (int , optional) – The auto-pause time in minutes for the cluster.
- auto_upgrade (bool , optional) – Should the cluster be automatically upgraded to the latest Bodo version on restart.
- auto_az (bool , optional) – Whether to automatically select the availability zone.
- use_spot_instance (bool , optional) – Whether to use spot instances for the cluster.
- aws_deployment_subnet_id (str , optional) – The AWS deployment subnet ID.
- availability_zone (str , optional) – The availability zone for the cluster.
- instance_role (InstanceRole | Dict , optional) – The instance role or a custom role configuration.
- custom_tags (Dict , optional) – Custom tags to assign to the cluster resources.
- Returns: The created Cluster object.
- Return type: Cluster
get(id: str)
Retrieves a cluster by its ID.
- Parameters: id (str) – The ID of the cluster to retrieve.
- Returns: The retrieved Cluster object.
- Return type: Cluster
get_bodo_versions()
Retrieves a list of available Bodo versions.
- Returns: A list of available Bodo versions.
- Return type: List[str]
get_images()
Retrieves a list of available images.
- Returns: A list of image IDs available for clusters.
- Return type: List[str]
get_instance_types()
Retrieves list of all supported instance types
- Returns: List[InstanceType]
property latest_bodo_version : str
Retrieves the latest Bodo version available.
- Returns: The latest Bodo version.
- Return type: str
list(filters: Dict | ClusterFilter | None = None, order: Dict | None = None)
Lists clusters based on the provided filters and order.
- Parameters:
- filters (Dict | ClusterFilter , optional) – The filters to apply to the cluster listing.
- order (Dict , optional) – The order in which to list the clusters.
- Returns: A list of clusters matching the criteria.
- Return type: ClusterList
pause(id: str, wait=False)
Pauses the specified cluster.
- Parameters: id (str) – The ID of the cluster to pause.
- Returns: The paused Cluster object.
- Return type: Cluster
remove(id: str, wait=False)
Removes the specified cluster.
- Parameters: id (str) – The ID of the cluster to remove.
- Returns: None
resume(id: str, wait=False)
Resumes the specified paused cluster.
- Parameters: id (str) – The ID of the cluster to resume.
- Returns: The resumed Cluster object.
- Return type: Cluster
scale(id: str, new_size: int)
Scales the specified cluster to the new size.
- Parameters:
- id (str) – The ID of the cluster to scale.
- new_size (int) – The new size for the cluster in terms of the number of worker nodes.
- Returns: The scaled Cluster object.
- Return type: Cluster
start(id: str, wait=False)
Starts the specified stopped cluster.
- Parameters: id (str) – The ID of the cluster to start.
- Returns: The started Cluster object.
- Return type: Cluster
stop(id: str, wait=False)
Stops the specified cluster.
- Parameters: id (str) – The ID of the cluster to stop.
- Returns: The stopped Cluster object.
- Return type: Cluster
update(id: str, name: str | None = None, description: str | None = None, auto_stop: int | None = None, auto_pause: int | None = None, auto_upgrade: bool | None = True, workers_quantity: int | None = None, instance_role: InstanceRole | Dict | None = None, instance_type: str | None = None, bodo_version: str | None = None, auto_az: bool | None = None, availability_zone: str | None = None, custom_tags: Dict | None = None)
Updates the specified cluster with the given configuration.
- Parameters:
- id (str) – The ID of the cluster to update.
- name (str , optional) – The new name for the cluster.
- description (str , optional) – A new description for the cluster.
- auto_stop (int , optional) – The new auto-stop time in minutes.
- auto_pause (int , optional) – The new auto-pause time in minutes.
- auto_upgrade (bool , optional) – if cluster should be updated after each restart.
- workers_quantity (int , optional) – The new number of worker nodes.
- instance_role (InstanceRole | Dict , optional) – The new instance role or custom role configuration.
- instance_type (str , optional) – The new instance type for the cluster nodes.
- bodo_version (str , optional) – The new Bodo version for the cluster.
- auto_az (bool , optional) – Whether to automatically select the availability zone.
- availability_zone (str , optional) – The new availability zone for the cluster.
- custom_tags (Dict , optional) – New custom tags for the cluster resources.
- Returns: The updated Cluster object.
- Return type: Cluster
wait_for_status(id: str, statuses: List, timeout: int | None = 300, tick: int | None = 30)
Waits for the specified cluster to reach any of the given statuses within the timeout period.
- Parameters:
- id (str) – The ID of the cluster to monitor.
- statuses (List) – The list of statuses to wait for.
- timeout (int , optional) – The timeout period in seconds.
- tick (int , optional) – The interval in seconds between status checks.
- Returns: The Cluster object if it reaches the desired status within the timeout period.
- Return type: Cluster
class bodosdk.clients.instance_role.InstanceRoleClient(workspace_client: IBodoWorkspaceClient)
Bases: IInstanceRoleClient
property InstanceRole : InstanceRole
Get the InstanceRole object.
- Returns: An instance of InstanceRole.
- Return type: InstanceRole
property InstanceRoleList : InstanceRoleList
Get the InstanceRoleList object.
- Returns: An instance of InstanceRoleList.
- Return type: InstanceRoleList
create(role_arn: str, description: str, name: str = None)
Create a new instance role.
- Parameters:
- role_arn (str) – The ARN of the role.
- description (str) – A description of the role.
- name (str , optional) – The name of the role. Defaults to None.
- Returns: The created instance role after saving.
- Return type: InstanceRole
delete(id: str)
Delete an instance role by its ID.
- Parameters: id (str) – The UUID of the instance role to delete.
get(id: str)
Get an instance role by its ID.
- Parameters: id (str) – The UUID of the instance role.
- Returns: An instance of InstanceRole.
- Return type: InstanceRole
list(filters: Dict | InstanceRoleFilter | None = None, order: Dict | None = None)
List all instance roles with optional filters and order.
- Parameters:
- filters (Optional *[*Union *[*Dict , InstanceRoleFilter ] ]) – A dictionary or InstanceRoleFilter
- filters. (object to apply) –
- order (Optional *[*Dict ]) – A dictionary to specify the order of the results.
- Returns: An instance of InstanceRoleList containing the filtered and ordered instance roles.
- Return type: InstanceRoleList
class bodosdk.clients.job.JobClient(workspace_client: IBodoWorkspaceClient)
Bases: IJobClient
property JobRun : JobRun
Get the JobRun object.
- Returns: An instance of JobRun.
- Return type: JobRun
property JobRunList : JobRunList
Get the JobRunList object.
- Returns: An instance of JobRunList.
- Return type: JobRunList
cancel_job(id: str)
Cancel job by id.
- Parameters: id (str) – Job id.
- Returns: Job object.
- Return type: JobRun
cancel_jobs(filters: Dict | JobFilter | None = None)
Cancel jobs with the given filters.
- Parameters: filters (Optional *[*Union *[*Dict , JobFilter ] ]) – Filters to apply on the list.
- Returns: JobRunList object.
- Return type: JobRunList
get(id: str)
Get job by id.
- Parameters: id (str) – Job id.
- Returns: Job object.
- Return type: JobRun
list(filters: Dict | JobFilter | None = None, order: Dict | None = None)
List jobs with the given filters.
- Parameters:
- filters (Optional *[*Union *[*Dict , JobFilter ] ]) – Filters to apply on the list.
- order (Optional *[*Dict ]) – Order to apply on the list.
- Returns: JobRunList object.
- Return type: JobRunList
run(template_id: str = None, cluster: dict | ICluster = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, exec_text: str = None, args: Sequence[Any] | Dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, name: str = None, catalog: str = None, store_result: bool = None)
Run a job with the given parameters.
- Parameters:
- template_id (str) – Job template id.
- cluster (Union *[*dict , ICluster ]) – Cluster object or cluster config.
- code_type (str) – Code type.
- source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ]) – Source object.
- exec_file (str) – Exec file path.
- exec_text (str) – Exec text.
- args (Union *[*Sequence *[*Any ] , Dict , str ]) – Arguments.
- env_vars (dict) – Environment variables.
- timeout (int) – Timeout.
- num_retries (int) – Number of retries.
- delay_between_retries (int) – Delay between retries.
- retry_on_timeout (bool) – Retry on timeout.
- name (str) – Job name.
- catalog (str) – Catalog, applicable only for SQL jobs.
- store_result (bool) – Whether to store the result.
- Returns: Job object.
- Return type: JobRun
run_sql_query(template_id: str = None, catalog: str = None, sql_query: str = None, cluster: dict | ICluster = None, name: str = None, args: Sequence[Any] | Dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, store_result: bool = True)
Run a SQL job with the given parameters.
- Parameters:
- template_id (str) – Job template id.
- catalog (str) – Catalog.
- sql_query (str) – SQL query.
- cluster (Union *[*dict , ICluster ]) – Cluster object or cluster config.
- name (str) – Job name.
- args (Union *[*Sequence *[*Any ] , Dict ]) – Arguments.
- timeout (int) – Timeout.
- num_retries (int) – Number of retries.
- delay_between_retries (int) – Delay between retries.
- retry_on_timeout (bool) – Retry on timeout.
- store_result (bool) – Whether to store the result.
- Returns: Job object.
- Return type: JobRun
wait_for_status(id: str, statuses: List[str], timeout: int = 3600, tick: int = 30)
Wait for job to reach one of the given statuses.
- Parameters:
- id (str) – Job id.
- statuses (List *[*str ]) – List of statuses to wait for.
- timeout (int) – Timeout.
- tick (int) – Tick.
- Returns: Job object.
- Return type: JobRun
class bodosdk.clients.job_tpl.JobTemplateClient(workspace_client: IBodoWorkspaceClient)
Bases: IJobTemplateClient
property JobTemplate : JobTemplate
property JobTemplateList : JobTemplateList
create(name: str = None, description: str = None, cluster: dict | ICluster = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, exec_text: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, catalog: str = None, store_result: bool = False)
Create a new job template with the given parameters.
- Parameters:
- name (str) – Name of the job template.
- description (str) – Description of the job template.
- cluster (Union *[*dict , ICluster ]) – Cluster object or cluster config.
- code_type (str) – Code type.
- source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ]) – Source object.
- exec_file (str) – Exec file path.
- exec_text (str) – Exec text.
- args (Union *[*dict , str ]) – Arguments.
- env_vars (dict) – Environment variables.
- timeout (int) – Timeout.
- num_retries (int) – Number of retries.
- delay_between_retries (int) – Delay between retries.
- retry_on_timeout (bool) – Retry on timeout.
- catalog (str) – Catalog, applicable only for SQL code type.
- store_result (bool) – Whether to store the result.
get(id: str)
Get job template by id.
- Parameters: id (str) – Job template id.
- Returns: Job template object.
- Return type: JobTemplate
list(filters: Dict = None)
List job templates with the given filters.
- Parameters: filters (Dict) – Filters to apply on the list.
- Returns: JobTemplateList object.
- Return type: JobTemplateList
remove(id: str)
Delete job template by id.
- Parameters: id (str) – Job template id.
class bodosdk.clients.cron_job.CronJobClient(workspace_client: IBodoWorkspaceClient)
_deprecated_methods: Dict = {}
A dictionary of deprecated methods.
Type: Dict
property CronJob: CronJob
Provides access to CronJob operations.
- Return: A CronJob instance for cron job operations.
- Return Type: CronJob
property CronJobList: CronJobList
Provides access to a CronJob list.
- Return: A CronJobList instance for listing cron jobs.
- Return Type: CronJobList
list(order: Dict | None = None)
List cron jobs based on the provided order.
- Parameters:
- order ( Dict , optional ) – The order in which to list the cron jobs.
- Return: A list of cron jobs in the provided ordering.
- Return Type: CronJobList
get(id: str)
Get cron job by id.
- Parameters:
- id ( str ) – The ID of the cron job to retrieve.
- Return: The retrieved CronJob object.
- Return Type: CronJob
create(name: str = None, description: str = None, schedule: str = None, timezone: str = "Etc/GMT", max_concurrent_runs: int = 1, job_template: IJobTemplate = None, cluster: Union[dict, ICluster] = None, pause_cluster_when_finished: bool = None)
Create a new cron job with the given parameters.
- Parameters:
- name (str) : Name of the cron job.
- description (str) : Description of the cron job.
- schedule (str) : Schedule the cron job should follow in UNIX cron syntax.
- timezone (str) : Timezone the cron job schedule should be based off of.
- max_concurrent_runs (int) : Maximum number of cron job runs that can happen concurrently (max is 10).
- job_template (IJobTemplate) : Job Template the cron job is derived from.
- cluster (Union[dict, ICluster]) : Cluster object or cluster config.
- pause_cluster_when_finished (bool) : Whether to pause the cluster when the cron job run has finished. Note this is only used when passing a non-job-dedicated cluster.
- Return: The created CronJob object.
- Return Type: CronJob
remove(id: str)
Delete cron job by id.
- Parameters:
- id ( str ) – The ID of the cron job to remove.
- Return: The status code of the response
- Return Type: int
class bodosdk.clients.organization.BodoOrganizationClient(client_id=None, secret_key=None, api_url='https://api.bodo.ai/api', auth_url='https://auth.bodo.ai', print_logs=False)
Bases: IBodoOrganizationClient
property CloudConfig : CloudConfig
property CloudConfigList : CloudConfigList
property Workspace : Workspace
property WorkspaceList : WorkspaceList
create_aws_cloud_config(name: str, tf_backend_region: str, role_arn: str | None = None, tf_bucket_name: str | None = None, tf_dynamo_db_table_name: str | None = None, account_id: str | None = None, access_key_id: str | None = None, secret_access_key: str | None = None, custom_tags: dict | None = None)
Create a new AWS cloud config in the organization with the given parameters.
- Parameters:
- name (str) – Name of the cloud config.
- tf_backend_region (str) – Terraform backend region.
- role_arn (Optional *[*str ]) – Role ARN.
- tf_bucket_name (Optional *[*str ]) – Terraform bucket name.
- tf_dynamo_db_table_name (Optional *[*str ]) – Terraform dynamo db table name.
- account_id (Optional *[*str ]) – Account id.
- access_key_id (Optional *[*str ]) – Access key id.
- secret_access_key (Optional *[*str ]) – Secret access key.
- custom_tags (Optional *[*dict ]) – Custom tags for the cloud config.
- Returns: CloudConfig object.
- Return type: CloudConfig
create_azure_cloud_config(name: str, tf_backend_region: str, tenant_id: str, subscription_id: str, resource_group: str, custom_tags: dict | None = None)
Create a new Azure cloud config in the organization with the given parameters.
- Parameters:
- name (str) – Name of the cloud config.
- tf_backend_region (str) – Terraform backend region.
- tenant_id (str) – Tenant id.
- subscription_id (str) – Subscription id.
- resource_group (str) – Resource group.
- custom_tags (Optional *[*dict ]) – Custom tags for the cloud config.
- Returns: CloudConfig object.
- Return type: CloudConfig
create_workspace(name: str, region: str, storage_endpoint_enabled: bool, cloud_config_id: str = None, vpc_id: str | None = None, public_subnets_ids: List[str] | None = None, private_subnets_ids: List[str] | None = None, custom_tags: dict | None = None)
Create a new workspace in the organization with the given parameters.
- Parameters:
- name (str) – Name of the workspace.
- region (str) – Region of the workspace.
- storage_endpoint_enabled (bool) – Enable storage endpoint for the workspace.
- cloud_config_id (str) – Cloud config id for the workspace.
- vpc_id (Optional *[*str ]) – VPC id for the workspace.
- public_subnets_ids (Optional *(*List *[*str ] )) – List of public subnet ids.
- private_subnets_ids (Optional *(*List *[*str ] )) – List of private subnet ids.
- custom_tags (Optional *(*dict )) – Custom tags for the workspace.
- Returns: Workspace object.
- Return type: Workspace
delete_workspace(id)
Delete workspace by id.
- Parameters: id (str) – Workspace id.
- Returns: Workspace object.
- Return type: Workspace
get_cloud_config(id)
Get cloud config by id.
- Parameters: id (str) – Cloud config id.
- Returns: CloudConfig object.
- Return type: CloudConfig
get_workspace(id)
Get workspace by id.
- Parameters: id (str) – Workspace id.
- Returns: Workspace object.
- Return type: Workspace
list_cloud_configs(filters: dict | None = None)
List cloud configs in the organization.
- Parameters: filters (Optional *[*Union *[*dict ] ]) – Filters to apply on the list.
- Returns: CloudConfigList object.
- Return type: CloudConfigList
list_workspaces(filters: WorkspaceFilter | dict | None = None)
List workspaces in the organization.
- Parameters: filters (Optional *[*Union [WorkspaceFilter , dict ] ]) – Filters to apply on the list.
- Returns: WorkspaceList object.
- Return type: WorkspaceList
class bodosdk.clients.workspace.BodoWorkspaceClient(client_id=None, secret_key=None, api_url='https://api.bodo.ai/api', auth_url='https://auth.bodo.ai', print_logs=False)
Bases: IBodoWorkspaceClient
ClusterClient : IClusterClient
JobClient : IJobClient
JobTemplateClient : IJobTemplateClient
property workspace_data : IWorkspace
Get workspace data.
- Returns: Workspace object.
- Return type: Workspace
property workspace_id : str
Get workspace id.
- Returns: Workspace id.
- Return type: str
class bodosdk.models.catalog.Catalog(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, description: str | None = None, catalogType: str | None = None, details: SnowflakeDetails | dict | None = None)
Bases: SDKBaseModel
, ICatalog
delete()
class bodosdk.models.catalog.CatalogFilter(*, names: List[str] | None = None, ids: List[str] | None = None)
Bases: SDKBaseModel
, ICatalogFilter
ids : List[str] | None
names : List[str] | None
class bodosdk.models.catalog.CatalogList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: dict | None = None, filters: dict | CatalogFilter | None = None)
Bases: ICatalogList
, SDKBaseModel
class Config
Bases: object
Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/
allow_population_by_field_name = True
extra = 'forbid'
delete()
filters : dict | CatalogFilter | None
order : dict | None
page : int | None
page_size : int | None
total : int | None
class bodosdk.models.catalog.SnowflakeDetails(*, port: int | None = None, db_schema: str | None = None, database: str | None = None, userRole: str | None = None, username: str | None = None, warehouse: str | None = None, accountName: str | None = None, password: str | None = None)
Bases: SDKBaseModel
account_name : str | None
database : str | None
db_schema : str | None
password : str | None
port : int | None
user_role : str | None
username : str | None
warehouse : str | None
class bodosdk.models.cloud_config.AwsProviderData(*, provider: str = 'AWS', roleArn: str | None = None, tfBucketName: str | None = None, tfDynamoDbTableName: str | None = None, tfBackendRegion: str | None = None, externalId: str | None = None, accountId: str | None = None, accessKeyId: str | None = None, secretAccessKey: str | None = None)
Bases: SDKBaseModel
, IAwsProviderData
access_key_id : str | None
account_id : str | None
external_id : str | None
provider : str
role_arn : str | None
secret_access_key : str | None
tf_backend_region : str | None
tf_bucket_name : str | None
tf_dynamo_db_table_name : str | None
class bodosdk.models.cloud_config.AzureProviderData(*, provider: str = 'AZURE', tfBackendRegion: str | None = None, resourceGroup: str | None = None, subscriptionId: str | None = None, tenantId: str | None = None, tfStorageAccountName: str | None = None, applicationId: str | None = None)
Bases: SDKBaseModel
, IAzureProviderData
application_id : str | None
provider : str
resource_group : str | None
subscription_id : str | None
tenant_id : str | None
tf_backend_region : str | None
tf_storage_account_name : str | None
class bodosdk.models.cloud_config.CloudConfig(org_client: IBodoOrganizationClient = None, *, name: str | None = None, status: str | None = None, organizationUUID: str | None = None, customTags: dict | None = None, uuid: str | UUID | None = None, provider_data: AwsProviderData | AzureProviderData | None = None)
Bases: SDKBaseModel
, ICloudConfig
custom_tags : dict | None
delete()
property id
name : str | None
organization_uuid : str | None
provider_data : AwsProviderData | AzureProviderData | None
status : str | None
uuid : str | UUID | None
class bodosdk.models.cloud_config.CloudConfigFilter(*, ids: List[str] | None = None, providers: List[str] | None = None, statuses: List[str] | None = None)
Bases: SDKBaseModel
ids : List[str] | None
providers : List[str] | None
statuses : List[str] | None
class bodosdk.models.cloud_config.CloudConfigList(org_client: IBodoOrganizationClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: dict | None = None, filters: CloudConfigFilter | None = None)
Bases: ICloudConfigList
, SDKBaseModel
class Config
Bases: object
Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/
allow_population_by_field_name = True
extra = 'forbid'
delete()
filters : CloudConfigFilter | None
order : dict | None
page : int | None
page_size : int | None
total : int | None
class bodosdk.models.cluster.BodoImage(*, image_id: str, bodo_version: str)
Bases: SDKBaseModel
Represents an image configuration for Bodo, encapsulating the image ID and the specific Bodo version.
This class is a data model that holds information about a Bodo environment image.
image_id
The unique identifier for the Bodo image.
- Type: str
bodo_version
The version of Bodo used in the image.
- Type: str
bodo_version : str
image_id : str
class bodosdk.models.cluster.Cluster(workspace_client: IBodoWorkspaceClient = None, *, name: str | None = None, uuid: str | None = None, status: str | None = None, description: str | None = None, instanceType: str | None = None, workersQuantity: int | None = None, autoStop: int | None = None, autoPause: int | None = None, autoUpgrade: bool | None = None, bodoVersion: str | None = None, coresPerWorker: int | None = None, acceleratedNetworking: bool | None = None, createdAt: datetime | None = None, updatedAt: datetime | None = None, isJobDedicated: bool | None = None, autoAZ: bool | None = None, useSpotInstance: bool | None = None, lastKnownActivity: datetime | None = None, inStateSince: datetime | None = None, clusterAgentVersion: str | None = None, clusterInitStatus: str | None = None, clusterHealthStatus: str | None = None, primaryAgentIP: str | None = None, awsDeploymentSubnetId: str | None = None, nodeMetadata: List[NodeMetadata] | None = None, availabilityZone: str | None = None, instanceRole: InstanceRole | None = None, workspace: dict | None = None, autoscalingIdentifier: str | None = None, lastAsgActivityId: str | None = None, nodesIp: List[str] | None = None, customTags: Dict | None = None)
Bases: SDKBaseModel
, ICluster
Represents a cluster in the SDK model, encapsulating various properties and operations related to a compute cluster.
name
The name of the cluster.
- Type: Optional[str]
uuid
The unique identifier of the cluster.
- Type: Optional[str]
status
The current status of the cluster (e.g., ‘RUNNING’, ‘STOPPED’).
- Type: Optional[str]
description
A description of the cluster.
- Type: Optional[str]
instance_type
The type of instances used in the cluster (e.g., ‘c5.large’).
- Type: Optional[str]
workers_quantity
The number of worker nodes in the cluster.
- Type: Optional[int]
auto_stop
The auto-stop configuration in minutes. The cluster automatically stops when idle for this duration.
- Type: Optional[int]
auto_pause
The auto-pause configuration in minutes. The cluster automatically pauses when idle for this duration.
- Type: Optional[int]
auto_upgrade
Should the cluster be upgraded on restart. The cluster is automatically upgraded to the latest Bodo version on restart when True.
- Type: Optional[bool]
bodo_version
The version of Bodo being used in the cluster.
- Type: Optional[str]
cores_per_worker
The number of CPU cores per worker node.
- Type: Optional[int]
accelerated_networking
Whether accelerated networking is enabled.
- Type: Optional[bool]
created_at
The creation timestamp of the cluster.
- Type: Optional[str]
updated_at
The last update timestamp of the cluster.
- Type: Optional[str]
is_job_dedicated
Whether the cluster is dedicated to a specific job.
- Type: Optional[bool]
auto_az
Whether automatic availability zone selection is enabled.
- Type: Optional[bool]
use_spot_instance
Whether spot instances are used for the cluster.
- Type: Optional[bool]
last_known_activity
The last known activity timestamp of the cluster.
- Type: Optional[str]
in_state_since
The timestamp since the cluster has been in its current state.
- Type: Optional[str]
cluster_agent_version
The version of the cluster agent.
- Type: Optional[str]
cluster_init_status
The initialization status of the cluster.
- Type: Optional[str]
cluster_health_status
The health status of the cluster.
- Type: Optional[str]
primary_agent_ip
The IP address of the primary agent in the cluster.
- Type: Optional[str]
aws_deployment_subnet_id
The subnet ID used for deploying AWS resources.
- Type: Optional[str]
node_metadata
Metadata information for each node in the cluster.
- Type: Optional[List[NodeMetadata]]
availability_zone
The AWS availability zone in which the cluster is located.
- Type: Optional[str]
instance_role
The IAM role used by instances in the cluster.
- Type: Optional[InstanceRole]
workspace
A dictionary containing workspace-related information for the cluster.
- Type: Optional[dict]
autoscaling_identifier
The identifier for autoscaling configuration.
- Type: Optional[str]
last_asg_activity_id
The identifier of the last activity in the autoscaling group.
- Type: Optional[str]
nodes_ip
A list of IP addresses for the nodes in the cluster.
- Type: Optional[List[str]]
accelerated_networking : bool | None
auto_az : bool | None
auto_pause : int | None
auto_stop : int | None
auto_upgrade : bool | None
autoscaling_identifier : str | None
availability_zone : str | None
aws_deployment_subnet_id : str | None
bodo_version : str | None
cancel_jobs()
Cancels all jobs associated with the cluster.
- Returns: The Cluster instance.
cluster_agent_version : str | None
cluster_health_status : str | None
cluster_init_status : str | None
connect(catalog: str)
Establishes a connection to the specified catalog from Cluster.
This method is responsible for creating and returning a new Connection instance based on the provided catalog.
Parameters: catalog (str): The name of the catalog to which the connection should be established.
Returns: Connection: An instance of Connection initialized with the specified catalog and the current class instance.
cores_per_worker : int | None
created_at : datetime | None
custom_tags : Dict | None
delete(force: bool = False, mark_as_terminated: bool = False, wait: bool = False)
Deletes the cluster, optionally forcing removal or marking as terminated.
- Parameters:
- force – If True, forces the deletion of the cluster.
- mark_as_terminated – If True, marks the cluster as terminated instead of deleting.
- wait – If True, waits till cluster will be TERMINATED.
- Returns: The Cluster instance, potentially updated to reflect its new state.
Handles: : ResourceNotFound: Silently if the cluster is already deleted or not found.
description : str | None
property id : str
The UUID of the cluster.
- Returns: The UUID string of the cluster.
in_state_since : datetime | None
instance_role : InstanceRole | None
instance_type : str | None
is_job_dedicated : bool | None
last_asg_activity_id : str | None
last_known_activity : datetime | None
name : str | None
node_metadata : List[NodeMetadata] | None
nodes_ip : List[str] | None
pause(wait: bool = False)
Pauses the cluster if it is running.
- Parameters: wait – If True, waits till cluster will be PAUSED.
- Returns: The Cluster instance with updated status.
- Raises: ConflictException – If the cluster cannot be paused due to its current status.
primary_agent_ip : str | None
resume(wait: bool = False)
Resumes the cluster if it was paused or stopped.
- Parameters: wait – If True, waits till cluster will be RUNNING.
- Returns: The Cluster instance with updated status.
- Raises: ConflictException – If the cluster cannot be resumed due to its current status.
run_job(template_id: str = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, name: str = None, catalog: str = None, store_result: bool = False)
Submits a batch job for execution using specified configurations and resources.
This method creates and dispatches a job within a computing cluster, allowing for extensive customization of execution parameters including source data, runtime environment, and failure handling strategies.
- Parameters:
- template_id (str , optional) – Identifier for the job template to use.
- code_type (str , optional) – Type of code to execute (e.g., Python, Java).
- source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ] , optional) – Source of the code to be executed. Can be specified as a dictionary or an instance of one of the predefined source interfaces.
- exec_file (str , optional) – Path to the executable file within the source.
- args (Union *[*dict , str ] , optional) – Arguments to pass to the executable.
- parameters. (Can be a string or a dictionary of) –
- env_vars (dict , optional) – Environment variables to set for the job.
- timeout (int , optional) – Maximum runtime (in seconds) before the job is terminated.
- num_retries (int , optional) – Number of times to retry the job on failure.
- delay_between_retries (int , optional) – Time to wait between retries.
- retry_on_timeout (bool , optional) – Whether to retry the job if it times out.
- name (str , optional) – Name of the job.
- catalog (str , optional) – Catalog to log the job under.
- store_result (bool , optional) – Whether to store on S3 job results or not.
- Returns: An object representing the submitted job, capable of providing status and results.
- Return type: IJobRun
run_sql_query(template_id: str = None, catalog: str = None, sql_query: str = None, name: str = None, args: Sequence[Any] | Dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, store_result: bool = False)
Submits an SQL query for execution on the cluster, returning a job run object.
This method handles the execution of an SQL query within a defined cluster environment. It supports customization of execution parameters such as query arguments, job name, execution timeouts, and retry strategies.
Parameters: : template_id (str, optional): Identifier for the job template to use. catalog (str, optional): Catalog under which to log the SQL job. sql_query (str, optional): The SQL query string to be executed. name (str, optional): Descriptive name for the SQL job. args (dict, optional): Dictionary of arguments that are passed to the SQL query. timeout (int, optional): Maximum allowable runtime in seconds before the job is terminated. num_retries (int, optional): Number of times the job will be retried on failure. delay_between_retries (int, optional): Interval in seconds between job retries. retry_on_timeout (bool, optional): Whether to retry the job if it times out. store_result (bool, optional): Whether to store on S3 job results or not.
Returns: : IJobRun: An object representing the status and result of the executed SQL job.
`
start(wait: bool = False)
Starts the cluster.
- Parameters: wait – If True, waits till cluster will be RUNNING.
- Returns: The Cluster instance with updated status.
status : str | None
stop(wait: bool = False)
Stops the cluster.
- Parameters: wait – If True, waits till cluster will be STOPPED.
- Returns: The Cluster instance with updated status.
update(auto_stop: int | None = None, auto_pause: int | None = None, auto_upgrade: bool | None = None, description: str | None = None, name: str | None = None, workers_quantity: int | None = None, instance_role: InstanceRole | None = None, instance_type: str | None = None, bodo_version: str | None = None, auto_az: bool | None = None, availability_zone: str | None = None, custom_tags: Dict | None = None)
Updates the cluster’s configuration with the provided values.
- Parameters:
- auto_stop – Optional; configures auto-stop feature.
- auto_pause – Optional; configures auto-pause feature.
- auto_upgrade – Optional; enables/disables auto-upgrade on restart.
- description – Optional; updates the cluster’s description.
- name – Optional; updates the cluster’s name.
- workers_quantity – Optional; updates the number of workers.
- instance_role – Optional; updates the instance role.
- instance_type – Optional; updates the instance type.
- bodo_version – Optional; updates the Bodo version.
- auto_az – Optional; enables/disables automatic availability zone selection.
- availability_zone – Optional; sets a specific availability zone.
- Returns: The updated Cluster instance.
updated_at : datetime | None
use_spot_instance : bool | None
uuid : str | None
wait_for_status(statuses: List[str], timeout: int = 600, tick: int = 30)
Waits for the cluster to reach one of the specified states within a given timeout.
- Parameters:
- statuses – A list of states to wait for.
- timeout – The maximum time to wait before raising a TimeoutException.
- tick – The interval between checks.
- Returns: The Cluster instance, once it has reached one of the desired states.
- Raises: TimeoutException – If the cluster does not reach a desired state within the timeout.
workers_quantity : int | None
workspace : dict | None
class bodosdk.models.cluster.ClusterFilter(*, uuids: List[str] | None = None, clusterNames: List[str] | None = None, statues: List[str] | None = None, tags: Dict | None = None)
Bases: SDKBaseModel
, IClusterFilter
Represents a filter used to select clusters based on specific criteria.
This class is used to construct filter criteria for querying clusters by their identifiers, names, statuses, or tags. It inherits from SDKBaseModel and implements the IClusterFilter interface.
ids
Optional list of cluster UUIDs. Default is an empty list.
- Type: Optional[List[str]]
cluster_names
Optional list of cluster names to filter by. Default is an empty list.
- Type: Optional[List[str]]
statuses
Optional list of cluster statuses to filter by. Default is an empty list.
- Type: Optional[List[str]]
tags
Optional dictionary of tags for more fine-grained filtering.
- Type: Optional[Dict]
Default is an empty dictionary.
Each attribute supports being set via their field name or by the specified alias in the Field definition.
class Config
Bases: object
Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/
allow_population_by_field_name = True
extra = 'forbid'
cluster_names : List[str] | None
ids : List[str] | None
statuses : List[str] | None
tags : Dict | None
class bodosdk.models.cluster.ClusterList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: ClusterFilter | dict | None = None)
Bases: IClusterList
, SDKBaseModel
A model representing a list of clusters, providing pagination, filtering, and operations on clusters such as start, stop, delete, resume, and pause.
page
The current page number for pagination, starting from 0.
- Type: Optional[int]
page_size
The number of items to be displayed per page.
- Type: Optional[int]
total
The total number of items available across all pages.
- Type: Optional[int]
order
Ordering information for listing clusters. Defaults to an empty dict.
- Type: Optional[Dict]
filters
Filtering criteria to apply when fetching the cluster list.
- Type: Optional[ClusterFilter]
_clusters
Internal list of cluster objects.
- Type: List[ICluster]
_index
Internal index to track the current position when iterating through clusters.
- Type: int
class Config
Bases: object
Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/
allow_population_by_field_name = True
extra = 'forbid'
cancel_jobs()
Cancels all jobs associated with the clusters.
- Returns: The ClusterList instance.
delete(wait=False)
Deletes each cluster in the list, updating the internal list with the result of each delete operation. This method effectively attempts to delete all clusters and returns the updated list.
- Returns: The current instance of ClusterList after attempting to delete : all clusters.
- Return type: ClusterList
filters : ClusterFilter | dict | None
order : Dict | None
page : int | None
page_size : int | None
pause(wait=False)
Attempts to pause each running cluster in the list. It handles exceptions gracefully, updating the list with the status of each cluster following the operation.
- Returns: The current instance of ClusterList after attempting to pause : all clusters.
- Return type: ClusterList
refresh()
Refreshes the list of clusters by resetting the pagination and filter settings, then reloading the first page of clusters. This method effectively resets the ClusterList instance to its initial state, based on current filters and ordering.
- Returns: The current instance of ClusterList after reloading the first page : of clusters.
- Return type: ClusterList
resume(wait=False)
Attempts to resume each paused or stopped cluster in the list. It handles exceptions gracefully, ensuring the list is updated with the status of each cluster after the operation.
- Returns: The current instance of ClusterList after attempting to resume : all clusters.
- Return type: ClusterList
run_job(template_id: str = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, name: str = None, catalog: str = None, store_result: bool = None)
Executes a job across all clusters managed by the instance.
This method supports multiple source types and configurations for executing jobs, including retries and custom environment variables.
- Parameters:
- template_id (str , optional) – Identifier for the job template to be used.
- code_type (str , optional) – The type of code to execute (e.g., Python, Java).
- source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ] , optional) –
- retrieved. (The source from where the job's code will be) –
- exec_file (str , optional) – Path to the main executable file within the source.
- args (Union *[*dict , str ] , optional) – Arguments to pass to the job. Can be a dictionary or a
- job. (string formatted as required by the) –
- env_vars (dict , optional) – Environment variables to set for the job execution.
- timeout (int , optional) – Maximum time in seconds for the job to run before it is terminated.
- num_retries (int , optional) – Number of times to retry the job on failure.
- delay_between_retries (int , optional) – Time in seconds to wait between retries.
- retry_on_timeout (bool , optional) – Whether to retry the job if it times out.
- name (str , optional) – A name for the job run.
- catalog (str , optional) – Catalog identifier to specify a data catalog for the job.
- store_result (bool , optional) – Whether to store on S3 job results or not.
- Returns: An object listing the UUIDs of jobs that were successfully initiated.
- Return type: IJobRunList
Decorators: : @check_deprecation: Checks if the method or its parameters are deprecated.
run_sql_query(template_id: str = None, catalog: str = None, sql_query: str = None, name: str = None, args: Sequence[Any] | Dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, store_result: bool = None)
Executes an SQL job across all clusters managed by the instance.
This method submits an SQL query for execution, allowing for additional configurations such as retries and setting execution timeouts.
- Parameters:
- template_id (str , optional) – Identifier for the job template to be used.
- catalog (str , optional) – Catalog identifier to specify a data catalog for the SQL job.
- sql_query (str , optional) – The SQL query to execute.
- name (str , optional) – A name for the job run.
- args (dict , optional) – Additional arguments specific to the SQL job.
- timeout (int , optional) – Maximum time in seconds for the job to run before it is terminated.
- num_retries (int , optional) – Number of times to retry the job on failure.
- delay_between_retries (int , optional) – Time in seconds to wait between retries.
- retry_on_timeout (bool , optional) – Whether to retry the job if it times out.
- store_result (bool , optional) – Whether to store on S3 job results or not.
- Returns: An object listing the UUIDs of SQL jobs that were successfully initiated.
- Return type: IJobRunList
Decorators: : @check_deprecation: Checks if the method or its parameters are deprecated.
start(wait=False)
Attempts to start each stopped or paused cluster in the list. It handles exceptions gracefully, ensuring the list reflects the status of each cluster after the operation.
- Returns: The current instance of ClusterList after attempting to start : all clusters.
- Return type: ClusterList
stop(wait=False)
Attempts to stop each running or starting cluster in the list. It handles exceptions gracefully, updating the list with the status of each cluster after the operation.
- Returns: The current instance of ClusterList after attempting to stop : all clusters.
- Return type: ClusterList
total : int | None
wait_for_status(statuses: List[str] = None, timeout: int = 600, tick: int = 60)
Waits for each cluster in the list to reach one of the specified statuses, updating the list with the results. This method polls each cluster’s status until it matches one of the desired statuses or until the timeout is reached.
- Parameters:
- statuses (List *[*str ] , optional) – A list of status strings to wait for.
- timeout (int , optional) – The maximum time to wait for each cluster to reach the desired status, in seconds.
- tick (int , optional) – The interval between status checks, in seconds.
- Returns: The current instance of ClusterList after waiting for all clusters : to reach one of the specified statuses.
- Return type: ClusterList
class bodosdk.models.cluster.InstanceType(*, name: str, vcpus: int, cores: int, memory: int, acceleratedNetworking: bool | None = None)
Bases: SDKBaseModel
, IInstanceType
Represents the specifications for a type of computing instance.
This class defines a specific configuration of a computing instance, including its processing power and memory capabilities, as well as optional features related to networking.
name
The name or identifier of the instance type.
- Type: str
vcpus
The number of virtual CPUs available in this instance type.
- Type: int
cores
The number of physical cores available in this instance type.
- Type: int
memory
The amount of RAM (in megabytes) available in this instance type.
- Type: int
accelerated_networking
Specifies if accelerated networking is enabled.
- Type: Optional[bool]
This is mapped to the JSON key 'acceleratedNetworking'. Defaults to None.
accelerated_networking : bool | None
cores : int
memory : int
name : str
vcpus : int
class bodosdk.models.cluster.NodeMetadata(*, privateIP: str | None = None, instanceId: str | None = None)
Bases: SDKBaseModel
instance_id : str | None
private_ip : str | None
class bodosdk.db.connection.Connection(catalog: str, cluster: ICluster, timeout=3600)
Bases: object
A connection to a catalog and cluster for executing queries.
close()
Close the connection and all associated cursors.
commit()
No-op because Bodo does not support transactions.
cursor(query_id=None)
Create a new cursor for executing queries.
- Parameters: query_id – Optional query ID for resuming a previous query.
- Returns: A new cursor instance.
rollback()
Rollback is not supported as transactions are not supported on Bodo.
- Raises: NotSupportedError – Always raised as rollback is not supported.
class bodosdk.db.connection.Cursor(catalog: str, cluster: ICluster, timeout: int = 3600, query_id=None)
Bases: ICursor
A cursor for executing and fetching results from SQL queries on a cluster.
close()
Close the cursor and clean up resources.
property description
Get the description of the result set columns.
- Returns: A list of column descriptions.
execute(query: str, args: Sequence[Any] | Dict = None)
Execute a SQL query synchronously.
- Parameters:
- query – The SQL query to execute.
- args – Optional arguments for the query.
- Returns: The cursor instance.
execute_async(query: str, args: Sequence[Any] | Dict = None)
Execute a SQL query asynchronously.
- Parameters:
- query – The SQL query to execute.
- args – Optional arguments for the query.
- Returns: The cursor instance.
fetchall()
Fetch all remaining rows of the result set.
- Returns: A list of all remaining rows.
fetchmany(size)
Fetch the next set of rows of the result set.
- Parameters: size – The number of rows to fetch.
- Returns: A list of rows.
fetchone()
Fetch the next row of the result set.
- Returns: The next row, or None if no more rows are available.
property query_id
Get the query ID.
- Returns: The query ID.
property rowcount
Get the number of rows in the result set.
- Returns: The number of rows in the result set.
property rownumber
Get the current row number.
- Returns: The current row number.
setinputsizes(sizes)
Set the sizes of input parameters (not supported).
- Parameters: sizes – Input sizes.
setoutputsize(size, column=None)
Set the size of the output (not supported).
- Parameters:
- size – The output size.
- column – The column to set the size for.
class bodosdk.models.job.GitRepoSource(*, type: str = 'GIT', repoUrl: str, reference: str | None = '', username: str, token: str)
Bases: SDKBaseModel
Git repository source definition.
repo_url
Git repository URL.
- Type: str
reference
Git reference (branch, tag, commit hash). (Default: “”)
- Type: Optional[str]
username
Git username.
- Type: str
token
Git token.
- Type: str
reference : str | None
repo_url : str
token : str
type : str
username : str
class bodosdk.models.job.JobConfig(*, type: str | None = None, source: GitRepoSource | WorkspaceSource | S3Source | TextSource | None = None, sourceLocation: str | None = None, sqlQueryText: str | None = None, sqlQueryParameters: dict | None = None, args: dict | Sequence | str | None = None, retryStrategy: RetryStrategy | None = None, timeout: int | None = None, envVars: dict | None = None, catalog: str | None = None, storeResult: bool | None = False)
Bases: SDKBaseModel
, IJobConfig
Configures details for executing a job, including the execution source, file location, and execution parameters.
type
The type of job: PYTHON, SQL, IPYNB.
- Type: Optional[str]
source
The source from which the job is configured or executed.
- Type: Optional[Union[GitRepoSource, WorkspaceSource, S3Source, TextSource]]
exec_file
The location of the file to execute.
- Type: Optional[str]
exec_text
The text containing script/code/sql to execute, valid for TextSource.
- Type: Optional[str]
sql_query_parameters
Parameters to substitute within an SQL query.
- Type: Optional[dict]
args
Additional arguments required for job execution. Can be a dictionary or string.
- Type: Optional[Union[dict, str]]
retry_strategy
Configuration for the job retry strategy.
- Type: Optional[RetryStrategy]
timeout
The maximum time (in seconds) allowed for the job to run before it times out.
- Type: Optional[int]
env_vars
Environment variables to be set for the job run.
- Type: Optional[dict]
catalog
The catalog associated with the job, if applicable.
- Type: Optional[str]
args : dict | Sequence | str | None
catalog : str | None
env_vars : dict | None
exec_file : str | None
exec_text : str | None
retry_strategy : RetryStrategy | None
source : GitRepoSource | WorkspaceSource | S3Source | TextSource | None
sql_query_parameters : dict | None
store_result : bool | None
timeout : int | None
type : str | None
class bodosdk.models.job.JobFilter(*, ids: List[str | UUID] | None = None, template_ids: List[str | UUID] | None = None, cluster_ids: List[str | UUID] | None = None, types: List[str] | None = None, statuses: List[str] | None = None, started_at: datetime | None = None, finished_at: datetime | None = None)
Bases: SDKBaseModel
Provides filtering options for querying job-related data.
ids
A list of job IDs to filter by.
- Type: Optional[List[Union[str, UUID]]]
template_ids
A list of job template IDs to filter by.
- Type: Optional[List[Union[str, UUID]]]
cluster_ids
A list of cluster IDs to filter jobs by.
- Type: Optional[List[Union[str, UUID]]]
types
A list of job types to filter by.
- Type: Optional[List[str]]
statuses
A list of job statuses to filter by.
- Type: Optional[List[str]]
started_at
A datetime value to filter jobs that started after it.
- Type: Optional[datetime]
finished_at
A datetime value to filter jobs that finished before it.
- Type: Optional[datetime]
cluster_ids : List[str | UUID] | None
finished_at : datetime | None
ids : List[str | UUID] | None
started_at : datetime | None
statuses : List[str] | None
template_ids : List[str | UUID] | None
types : List[str] | None
class bodosdk.models.job.JobRun(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, type: str | None = 'BATCH', submittedAt: datetime | None = None, finishedAt: datetime | None = None, lastHealthCheck: datetime | None = None, lastKnownActivity: datetime | None = None, status: str | None = None, reason: str | None = None, numRetriesUsed: int | None = 0, tag: List | None = None, clusterUUID: str | None = None, cluster: 'Cluster' | None, clusterConfig: 'Cluster' | None = None, config: JobConfig | None = None, jobTemplateUUID: str | None = None, submitter: str | None = None, stats: dict | None = None)
Bases: SDKBaseModel
, IJobRun
Details a specific instance of a job run, including status and configuration details.
uuid
Unique identifier for the job run.
- Type: Optional[str]
name
Name of the job run.
- Type: Optional[str]
type
Type of job run, defaults to ‘BATCH’ if not specified.
- Type: Optional[str]
submitted_at
Timestamp when the job was submitted.
- Type: Optional[datetime]
finished_at
Timestamp when the job finished running.
- Type: Optional[datetime]
last_health_check
Timestamp of the last health check performed.
- Type: Optional[datetime]
last_known_activity
Timestamp of the last known activity of this job.
- Type: Optional[datetime]
status
Current status of the job run.
- Type: Optional[str]
reason
Reason for the job’s current status, particularly if there’s an error or failure.
- Type: Optional[str]
num_retries_used
The number of retries that have been used for this job run. Defaults to 0.
- Type: Optional[int]
tags
Tags associated with the job run. Default is an empty list.
- Type: Optional[List]
cluster_uuid
UUID of the cluster on which the job is running.
- Type: Optional[str]
cluster
The cluster object associated with the job run.
- Type: Optional[Cluster]
cluster_config
Configuration of the cluster used for this job run.
- Type: Optional[Cluster]
config
Job configuration details used for this run.
- Type: Optional[JobConfig]
job_template_id
UUID of the template from which this job was created.
- Type: Optional[str]
submitter
The identifier of the user who submitted the job.
- Type: Optional[str]
stats
Statistical data about the job run.
- Type: Optional[dict]
cancel()
Cancels the job associated with this instance and updates its state based on the response from the job API.
This method sends a cancellation request for the job identified by its UUID to the job API, handles the response to update the job’s attributes, and resets its modified state.
- Returns: The job instance, updated with the latest state after the cancellation attempt.
- Return type: JobRun
- Raises: APIException – If the cancellation request fails or returns an error response.
cluster : 'Cluster' | None
cluster_config : 'Cluster' | None
cluster_id : str | None
config : JobConfig | None
finished_at : datetime | None
get_logs_urls()
get_result_urls()
get_stderr()
get_stdout()
property id : str
job_template_id : str | None
last_health_check : datetime | None
last_known_activity : datetime | None
name : str | None
num_retries_used : int | None
reason : str | None
stats : dict | None
status : str | None
submitted_at : datetime | None
submitter : str | None
tags : List | None
type : str | None
uuid : str | None
wait_for_status(statuses: List[str], timeout: int = 3600, tick: int = 30)
Waits for the job to reach one of the specified statuses, polling at regular intervals.
This method checks the current status of the job at regular intervals defined by the tick parameter. If the job reaches one of the specified statuses before the timeout, it returns the job instance. If the timeout is exceeded, it raises a TimeoutException.
- Parameters:
- statuses (list or tuple) – A list or tuple of statuses that the job is expected to reach.
- timeout (int , optional) – The maximum time in seconds to wait for the job to reach one of the specified statuses. Defaults to 600 seconds.
- tick (int , optional) – The time interval in seconds between status checks. Defaults to 30 seconds.
- Returns: The job instance if one of the specified statuses is reached within the timeout period.
- Return type: JobRun
- Raises: TimeoutException – If the job does not reach one of the specified statuses within the specified timeout period.
class bodosdk.models.job.JobRunList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: JobFilter | None = None)
Bases: IJobRunList
, SDKBaseModel
Represents a paginated list of job runs, including metadata and filtering capabilities.
page
The current page number in the paginated list. Defaults to 0.
- Type: Optional[int]
page_size
The number of job runs to display per page. Defaults to 10.
- Type: Optional[int]
total
The total number of job runs available.
- Type: Optional[int]
order
A dictionary specifying the order in which job runs are sorted.
- Type: Optional[Dict]
filters
A JobFilter object used to apply filtering on the list of job runs.
- Type: Optional[JobFilter]
cancel()
Cancels all jobs in the list.
This method iterates through each job in the current list (represented by instances of this class), and calls the cancel method on each job to attempt their cancellation. After attempting to cancel all jobs, it returns the updated list of jobs.
- Returns: The list instance, potentially with updated states of the individual jobs : if the cancellation requests were successful.
- Return type: JobRunList
property clusters
filters : JobFilter | None
order : Dict | None
page : int | None
page_size : int | None
refresh()
Refreshes the list of jobs by clearing the current elements and reloading the next page of job data.
This method resets the internal state of the list, including the pagination index and the elements themselves. It then loads the next page of data from the underlying data source to repopulate the list. The list is temporarily set to mutable during this process to allow updates, and then set back to immutable once the refresh is complete.
- Returns: The refreshed job run list instance, with elements reloaded from the next available page.
- Return type: JobRunList
total : int | None
wait_for_status(statuses: List[str], timeout: int = 3600, tick: int = 30)
Waits for each job in the list to reach one of the specified statuses, polling each at regular intervals.
This method iterates over each job within the list, checking at intervals (defined by tick) to see if the job has reached one of the desired statuses specified in statuses. If a job reaches the desired status within the timeout period, the method continues to the next job. If the timeout is reached without the job reaching the desired status, a TimeoutException is raised.
- Parameters:
- statuses (list or tuple) – A list or tuple of statuses that each job is expected to reach.
- timeout (int , optional) – The maximum time in seconds to wait for each job to reach one of the specified statuses. Defaults to 600 seconds.
- tick (int , optional) – The time interval in seconds between status checks for each job. Defaults to 30 seconds.
- Returns: The job run list instance, after attempting to wait for all jobs to reach the desired statuses.
- Return type: JobRunList
- Raises: TimeoutException – If any job does not reach one of the specified statuses within the specified timeout period, including details of the job’s UUID and its current status.
class bodosdk.models.job.JobRunLogsResponse(*, stderrUrl: str = None, stdoutUrl: str = None, expirationDate: str = None)
Bases: SDKBaseModel
, IJobRunLogsResponse
Represents the response object containing URLs for accessing logs related to a job run.
stderr_location_url
The URL to access the standard error logs of the job run.
- Type: str
stdout_location_url
The URL to access the standard output logs of the job run.
- Type: str
expiration_date
The date when the log URLs will expire and no longer be accessible.
- Type: str
expiration_date : str
stderr_location_url : str
stdout_location_url : str
class bodosdk.models.job.JobTemplate(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, jobRuns: List[JobRun] = None, description: str | None = None, createdBy: str | None = None, config: JobConfig | None = None, clusterConfig: 'Cluster' | None = None)
Bases: SDKBaseModel
, IJobTemplate
Represents a template for creating and managing jobs within an SDK environment, encapsulating common job configurations.
uuid
The unique identifier for the job template.
- Type: Optional[str]
name
The name of the job template.
- Type: Optional[str]
job_runs
A list of job runs associated with this template. Default is an empty list if none are specified.
- Type: List[JobRun]
description
A brief description of the job template.
- Type: Optional[str]
created_by
The identifier of the user who created the job template.
- Type: Optional[str]
config
The job configuration specifics for this template.
- Type: Optional[JobConfig]
cluster_config
Configuration details of the cluster on which the jobs will run. Default is None.
- Type: Optional[Cluster]
cluster_config : Cluster | None
config : JobConfig | None
created_by : str | None
delete()
Deletes the job template.
description : str | None
property id
job_runs : List[JobRun]
name : str | None
run(name: str = None, cluster: dict | ICluster = None, code_type: str = None, source: dict | IS3Source | IGitRepoSource | IWorkspaceSource | ITextSource = None, exec_file: str = None, exec_text: str = None, args: dict | str = None, env_vars: dict = None, timeout: int = None, num_retries: int = None, delay_between_retries: int = None, retry_on_timeout: bool = None, catalog: str = None, store_result: bool = False)
Runs a job using the template configuration.
- Parameters:
- name (str , optional) – The name of the job run.
- cluster (Union *[*dict , ICluster ] , optional) – The cluster configuration or instance to run the job on.
- code_type (str , optional) – The type of code to execute.
- source (Union *[*dict , IS3Source , IGitRepoSource , IWorkspaceSource , ITextSource ] , optional) – The source configuration for the job.
- exec_file (str , optional) – The file to execute.
- exec_text (str , optional) – The text containing the script/code/sql to execute.
- args (Union *[*dict , str ] , optional) – Arguments required for job execution.
- env_vars (dict , optional) – Environment variables for the job run.
- timeout (int , optional) – The maximum time (in seconds) allowed for the job to run.
- num_retries (int , optional) – The number of retries allowed for the job.
- delay_between_retries (int , optional) – The delay between retries in minutes.
- retry_on_timeout (bool , optional) – Whether to retry the job on timeout.
- catalog (str , optional) – The catalog associated with the job.
- store_result (bool , optional) – Whether to store the result of the job run.
- Returns: The job run instance.
- Return type: IJobRun
uuid : str | None
class bodosdk.models.job.JobTemplateFilter(*, ids: List[str] | None = None, names: List[str] | None = None, tags: dict | None = None)
Bases: SDKBaseModel
, IJobTemplateFilter
Class representing filters for JobTemplateList.
ids
Returns list matching given ids.
- Type: List[str] | None
names
Returns list matching given names.
- Type: List[str] | None
tags
Returns list matching given tags.
- Type: dict | None
ids : List[str] | None
names : List[str] | None
tags : dict | None
class bodosdk.models.job.JobTemplateList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: JobTemplateFilter | None = None)
Bases: IJobTemplateList
, SDKBaseModel
Represents a list of JobTemplates, providing pagination and filtering capabilities.
page
The current page number in the list of Job Templates. Defaults to 0.
- Type: Optional[int]
page_size
The number of Job Templates to display per page. Defaults to 10.
- Type: Optional[int]
total
The total number of Job Templates available.
- Type: Optional[int]
order
A dictionary specifying the order in which Job Templates are sorted.
- Type: Optional[Dict]
filters
A filter object used to filter the Job Templates listed.
- Type: Optional[JobTemplateFilter]
delete()
Deletes all job definitions in the list.
This method iterates over each job definition contained within the list and calls its individual delete method. This action attempts to remove each job definition from the underlying storage or management system.
filters : JobTemplateFilter | None
order : Dict | None
page : int | None
page_size : int | None
refresh()
Refreshes the list of job templates by clearing the current elements and reloading the next page of job data.
This method resets the internal state of the list, including the pagination index and the elements themselves. It then loads the next page of data from the underlying data source to repopulate the list. The list is temporarily set to mutable during this process to allow updates, and then set back to immutable once the refresh is complete.
- Returns: The refreshed job template list instance, with elements reloaded from : the next available page.
- Return type: JobTemplateList
run(instance_type: str = None, workers_quantity: int = None, bodo_version: str = None)
Executes all job definitions in the list with specified configurations and aggregates their run IDs.
This method iterates over each job definition in the list and invokes the run method on each, passing the specified configuration parameters such as instance type, the quantity of workers, and the Bodo version. It collects the IDs of the resulting job runs and returns a consolidated list of these job runs.
- Parameters:
- instance_type (str , optional) – The type of instance to use for the jobs. This may specify the hardware specifications.
- workers_quantity (int , optional) – The number of worker instances to deploy for the jobs.
- bodo_version (str , optional) – The specific version of Bodo to use for executing the jobs.
- Returns: An interface to the list of job runs created by executing the job definitions, : allowing further operations like monitoring and management.
- Return type: IJobRunList
total : int | None
class bodosdk.models.CronJob(workspace_client: IBodoWorkspaceClient, uuid: str | None = None, name: str | None = None, description: str | None = None, created_by: str | None = None, schedule: str | None = None, timezone: str | None = None, last_run_date: date | None = None, next_run_date: date | None = None, max_concurrent_runs: int | None = None, config: JobConfig | None = None, cluster_config: Cluster | None = None, cluster_id: str | None = None, job_template_id: str | None = None, job_runs: List[JobRun] = None)
Bases: SDKBaseModel
, ICronJob
uuid: str | None
The unique identifier of the cron job.
- Type: Optional[str]
name: str | None
The name of the cron job.
- Type: Optional[str]
description: str | None
The description of the cron job.
- Type: Optional[str]
created_by: str | None
The email of the person who created the cron job.
- Type: Optional[str]
schedule: str | None
When this cron job should schedule batch jobs.
- Type: Optional[str]
timezone: str | None
The timezone the cron job schedule should follow.
- Type: Optional[str]
last_run_date: date | None
The time and date of the most recently scheduled batch job run from this cron job.
- Type: Optional[date]
next_run_date: date | None
The time and date of when this cron jobs’s next batch job run should be scheduled.
- Type: Optional[date]
max_concurrent_runs: int | None
The limit on the number of concurrently running batch job runs this cron job can have simultaneously.
- Type: Optional[int]
config: JobConfig | None
Additional job configurations that this cron job’s batch job should be created with.
- Type: Optional[JobConfig]
cluster_config: ClusterConfig | None
The job-dedicated cluster config this cron job should create batch job runs with.
- Type: Optional[Cluster]
cluster_id: str | None
The existing cluster this cron job should execute batch jobs on.
- Type: Optional[str]
job_template_id: str | None
The job template this cron job should create batch jobs from.
- Type: Optional[str]
job_runs: List[JobRun] | None
The list of batch job runs stemming from this cron job
- Type: List[JobRun]
property id: str
Provides the unique identifier for this cron job.
- Return: The cron job’s unique identifier.
- Return Type: str
run()
Manually create and execute a batch job run from this cron job.
- Return: The created job run.
- Return Type: JobRun
delete()
Delete this cron job.
- Return: The status code of the response
- Return Type: int
deactivate()
Deactivate this cron job.
- Return: The status code of the response
- Return Type: int
reactivate()
Reactivate this cron job.
- Return: The status code of the response
- Return Type: int
history()
Retrieve the batch job runs stemming from this cron job.
- Return: A list of job runs created from this cron job.
- Return Type: JobRunList
class bodosdk.clients.cron_job.CronJobList(workspace_client: IBodoWorkspaceClient, page: int | None = 0, page_size: int | None = 10, total: int | None = None, order: Dict | None = {})
Bases: SDKBaseModel
, ICronJobList
page: int | None
The current page number in the list of Cron Jobs. Defaults to 0.
- Type: Optional[int]
page_size: int | None
The number of Cron Jobs to display per page. Defaults to 10.
- Type: Optional[int]
total: int | None
The total number of Cron Jobs available.
- Type: Optional[int]
order: Dict | None
A dictionary specifying the order in which Cron Jobs are sorted.
- Type: Optional[Dict]
refresh()
Refreshes the list of cron jobs by clearing the current elements and reloading the next page of cron job data.
This method resets the internal state of the list, including the pagination index and the elements themselves. It then loads the next page of data from the underlying data source to repopulate the list. The list is temporarily set to mutable during this process to allow updates, and then set back to immutable once the refresh is complete.
- Returns: The refreshed cron job list instance, with elements reloaded from the next available page.
- Return Type: CronJobList
class bodosdk.models.job.RetryStrategy(*, numRetries: int = 0, delayBetweenRetries: int = 1, retryOnTimeout: bool = False)
Bases: SDKBaseModel
A Pydantic model that defines the retry strategy for a job or a task. It specifies how many retries should be attempted, the delay between retries, and whether to retry on timeout.
num_retries
The number of times to retry the job after a failure. Defaults to 0, meaning no retries by default.
- Type: int
delay_between_retries
The delay between consecutive retries, expressed in minutes. Defaults to 1 minute.
- Type: int
retry_on_timeout
A flag to determine if the job should be retried upon timing out. Defaults to False, indicating no retry on timeout.
- Type: bool
delay_between_retries : int
num_retries : int
retry_on_timeout : bool
class bodosdk.models.job.S3Source(*, type: str = 'S3', bucketPath: str, bucketRegion: str)
Bases: SDKBaseModel
S3 source definition.
bucket_path
S3 bucket path.
- Type: str
bucket_region
S3 bucket region.
- Type: str
bucket_path : str
bucket_region : str
type : str
class bodosdk.models.job.TextSource(*, type: str = 'SQL')
Bases: SDKBaseModel
Represents a specific type of source where the job configuration can originate from a text source.
type
Specifies the type of source.
- Type: str
type : str
class bodosdk.models.job.WorkspaceSource(*, type: str = 'WORKSPACE', path: str)
Bases: SDKBaseModel
Workspace source definition.
path
Workspace path.
- Type: str
path : str
type : str
class bodosdk.models.common.AWSNetworkData(*, region: str | None = None, storageEndpoint: bool | None = None, vpcId: str | None = None, publicSubnetsIds: List[str] | None = None, privateSubnetsIds: List[str] | None = None, policyArns: List[str] | None = None)
Bases: NetworkData
Extends the NetworkData class to include specific properties for AWS networking.
vpc_id
The ID of the AWS Virtual Private Cloud (VPC) where workspace should be created.
- Type: Optional[str]
public_subnets_ids
List of IDs for the public subnets within the AWS VPC.
- Type: Optional[List[str]]
private_subnets_ids
List of IDs for the private subnets within the AWS VPC.
- Type: Optional[List[str]]
policies_arn
List of AWS Resource Names (ARNs) for the policies applied to the network.
- Type: Optional[List[str]]
policies_arn : List[str] | None
private_subnets_ids : List[str] | None
public_subnets_ids : List[str] | None
vpc_id : str | None
class bodosdk.models.common.NetworkData(*, region: str | None = None, storageEndpoint: bool | None = None)
Bases: SDKBaseModel
A base model for network-related data within an SDK context.
region
The geographic region where the network is located.
- Type: Optional[str]
storage_endpoint
Indicates whether a storage endpoint is enabled.
- Type: Optional[bool]
region : str | None
storage_endpoint : bool | None
class bodosdk.models.instance_role.InstanceRole(workspace_client: IBodoWorkspaceClient = None, *, uuid: str | None = None, name: str | None = None, description: str | None = None, roleArn: str | None = None, status: str | None = None)
Bases: SDKBaseModel
, IInstanceRole
delete()
Removes instance role from workspace
- Returns: None
description : str | None
property id
name : str | None
role_arn : str | None
status : str | None
uuid : str | None
class bodosdk.models.instance_role.InstanceRoleFilter(*, uuids: List[str] | None = None, names: List[str] | None = None, roleArns: List[str] | None = None)
Bases: SDKBaseModel
, IInstanceRoleFilter
Class representing filters for InstanceRoleList
ids
returns list matching given ids
- Type: List[str] | None
names
returns list matching given names
- Type: List[str] | None
role_arns
returns list matching giver arns
- Type: List[str] | None
class Config
Bases: object
Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/
allow_population_by_field_name = True
extra = 'forbid'
ids : List[str] | None
names : List[str] | None
role_arns : List[str] | None
class bodosdk.models.instance_role.InstanceRoleList(workspace_client: IBodoWorkspaceClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: InstanceRoleFilter | None = None)
Bases: IInstanceRoleList
, SDKBaseModel
Represents a list of instance roles within an SDK context, providing pagination and filtering capabilities.
page
The current page number in the list of instance roles. Defaults to 0.
- Type: Optional[int]
page_size
The number of instance roles to display per page. Defaults to 10.
- Type: Optional[int]
total
The total number of instance roles available.
- Type: Optional[int]
order
A dictionary specifying the order in which instance roles are sorted.
- Type: Optional[Dict]
filters
A filter object used to filter the instance roles listed.
- Type: Optional[InstanceRoleFilter]
class Config
Bases: object
Configuration for Pydantic models. https://docs.pydantic.dev/latest/api/config/
allow_population_by_field_name = True
extra = 'forbid'
delete()
filters : InstanceRoleFilter | None
order : Dict | None
page : int | None
page_size : int | None
total : int | None
class bodosdk.models.workspace.Workspace(org_client: IBodoOrganizationClient = None, *, name: str | None = None, uuid: str | UUID | None = None, status: str | None = None, organizationUUID: str | UUID | None = None, networkData: NetworkData | AWSNetworkData | None = None, createdBy: str | None = None, notebookAutoDeployEnabled: bool | None = None, assignedAt: datetime | None = None, customTags: Dict[str, Any] | None = None, jupyterLastActivity: datetime | None = None, jupyterIsActive: bool | None = False, cloudConfig: CloudConfig | None = None)
Bases: SDKBaseModel
, IWorkspace
assigned_at : datetime | None
cloud_config : CloudConfig | None
created_by : str | None
custom_tags : Dict[str, Any] | None
delete()
Deletes the workspace from the API based on its UUID and updates the instance’s properties with the response from the deletion API call.
- Returns: The Workspace instance after deletion, updated with the API’s response data.
property id
jupyter_is_active : bool | None
jupyter_last_activity : datetime | None
name : str | None
network_data : NetworkData | AWSNetworkData | None
notebook_auto_deploy_enabled : bool | None
organization_uuid : str | UUID | None
status : str | None
update_infra()
uuid : str | UUID | None
wait_for_status(statuses, timeout=600, tick=30)
Waits for the workspace to reach one of the specified states within a given timeout.
- Parameters:
- statuses – A list of states to wait for.
- timeout – The maximum time to wait before raising a TimeoutException.
- tick – The interval between checks.
- Returns: The workspace instance, once it has reached one of the desired states.
- Raises: TimeoutException – If the workspace does not reach a desired state within the timeout.
class bodosdk.models.workspace.WorkspaceFilter(*, uuids: List[str] | None = None, names: List[str] | None = None, statuses: List[str] | None = None, organizationUUIDs: List[str] | None = None)
Bases: SDKBaseModel
, IWorkspaceFilter
class Config
Bases: object
allow_population_by_field_name = True
extra = 'forbid'
ids : List[str] | None
names : List[str] | None
organization_uuids : List[str] | None
statuses : List[str] | None
class bodosdk.models.workspace.WorkspaceList(org_client: IBodoOrganizationClient = None, *, page: int | None = 0, pageSize: int | None = 10, total: int | None = None, order: Dict | None = None, filters: WorkspaceFilter | None = None)
Bases: IWorkspaceList
, SDKBaseModel
class Config
Bases: object
allow_population_by_field_name = True
extra = 'forbid'
delete()
Deletes the workspaces present on the list
- Returns: WorkspaceListAPIModel
filters : WorkspaceFilter | None
order : Dict | None
page : int | None
page_size : int | None
total : int | None
update_infra()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bodosdk-2.2.0.tar.gz
.
File metadata
- Download URL: bodosdk-2.2.0.tar.gz
- Upload date:
- Size: 125.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ec829ead30a7718d6ef33a2cbc1156379a6e81c791b44b9a9644209d53cea7b |
|
MD5 | 44529c4b6620ba2126c6c64febb2ab80 |
|
BLAKE2b-256 | db672c26742e89ae4385e472282dc8877c7a69cda70c803cfbba87b6f96231c8 |
File details
Details for the file bodosdk-2.2.0-py3-none-any.whl
.
File metadata
- Download URL: bodosdk-2.2.0-py3-none-any.whl
- Upload date:
- Size: 106.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ae6e8ff41a39af72e57becff53ef445779802d6d52b7ba967f0d075970246c8 |
|
MD5 | 4f290878aec6169977bb8c8f7a296e7a |
|
BLAKE2b-256 | 9cd184c723f7eac77c2bf1f8e10ef773a3d4ce0832d933e1f5445a409a30bd8d |