Sama Python Client and Databricks Connector
Project description
Sama Python SDK and Databricks Connector
This is the Python Client for the Sama API endpoints and Databricks Connector.
See our Python SDK tutorial and Databricks Connector tutorial.
Usage
Python Client
from sama import Client
client = Client("your_api_key")
client.create_task_batch("project_id", [{"url": "https://yoururl.com/img.jpg", "input2": "value2"}])
client.get_delivered_tasks("project_id", from_timestamp="2023-09-02T10:23:36.536167366Z")
Databricks Connector
from sama.databricks import Client
client = Client("your_api_key")
client.create_task_batch_from_table("project_id", spark_df) # spark_df contains inputs to tasks
spark_df = client.get_delivered_tasks_to_table(spark, "project_id", from_timestamp="2023-09-02T10:23:36.536167366Z")
sama Client
This class provides methods to interact with SamaHub API endpoints.
__init__
method
This method is the constructor to initialize the SamaHub API client.
Parameters
api_key
(str): The API key to use for authentication.silent
(bool, optional): Whether to suppress all print/log statements. Defaults toTrue
.logger
(Logger, optional): The logger to use for logging. Defaults toNone
.log_level
(int, optional): The log level to use for logging. Defaults tologging.INFO
.
create_task_batch
This method creates a batch of tasks on SamaHub using the asynchronous batch task creation API endpoints, specifically the tasks file upload approach.
Parameters
-
proj_id (str)
: The project ID on SamaHub where tasks will be created. -
task_data_records (List[Dict[str, Any]])
: A list of task "data" dictionaries which can contain inputs and pre-annotations. -
batch_priority (int, default=0)
: The priority of the batch. A negative number indicates a higher priority. -
notification_email (Union[str, None], default=None)
: An email address where SamaHub will send notifications about the batch creation status. -
submit (bool, default=False)
: A flag determining whether to create the tasks in a submitted state.
Returns
- A JSON response from the last
create a batch of tasks from an uploaded file
endpoint call.
Description
The method first constructs a tasks list with data, priority, and submission status. It then calls the 'create a batch of tasks' endpoint without providing the actual tasks list. This initial call returns a batch ID and a tasks_put_url (a pre-signed AWS S3 URL), which allows for the tasks to be uploaded directly to S3. This method bypasses the 1000 tasks limit. After uploading the tasks to S3, a subsequent API call is made to notify the system that the tasks are uploaded and to begin the task creation process.
cancel_batch_creation_job
Cancels an ongoing batch creation job.
Parameters:
proj_id (str)
: The project ID on SamaHub where the task exists.batch_id (str)
: The ID of the batch to cancel.
Returns:
- Response from the API endpoint.
reject_task
Rejects a task on SamaHub to send it back for rework.
Parameters:
proj_id (str)
: The project ID on SamaHub where the task exists.task_id (str)
: The ID of the task to reject.reasons (List[str])
: List of reasons for rejecting the task.
Returns:
- Response from the API endpoint.
Updates priority of tasks
Parameters:
project_id (str)
: The project ID on SamaHub where the task exists.task_ids (List[str])
: The IDs of the tasks to update priority.priority (int)
: The priority.
Returns:
- Response from the API endpoint.
Delete tasks
Parameters:
project_id (str)
: The project ID on SamaHub where the task exists.task_ids (List[str])
: The IDs of the tasks to delete.
Returns:
- Response from the API endpoint.
get_task_status
Fetches the status and details of a single task. More details can be found in the Sama documentation.
Parameters:
proj_id
: The project ID on SamaHub.task_id
: The ID of the task to fetch.same_as_delivery (default=True)
: Whether to fetch the task as it would be delivered.
Returns:
- Task details from the API.
get_multi_task_status
Fetches status and details for multiple tasks. Returns a generator object. More details can be found in the Sama documentation.
Parameters:
proj_id (str)
: The unique identifier of the project on SamaHub. This parameter specifies the project under which the tasks reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters tasks that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering tasks based on client-defined batches.client_batch_id_match_type (str, optional)
: This parameter specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".date_type (str, optional)
: Determines which date to use for the timestamp filters. Examples might include "creation_date" or "completion_date".from_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) after this timestamp.to_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) before this timestamp.state (TaskStates, optional)
: An enum value that specifies the desired status of the tasks to filter. For example, "delivered" or "acknowledged".omit_answers (bool, optional)
: Flag to determine if answers related to tasks should be omitted from the response. Defaults to True.
Returns:
- An iterable generator object with task details.
get_delivered_tasks
Get all task deliveries since a given timestamp (RFC3339 format).
Parameters:
proj_id (str)
: The unique identifier of the project on SamaHub. It specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters deliveries that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".from_timestamp (str, optional)
: Filters deliveries that have a date after this timestamp.task_id (str, optional)
: The unique identifier for a specific task. If provided, it fetches deliveries related to this specific task.
Returns:
- An iterable generator object with task deliveries.
get_deliveried_tasks_since_last_call
Fetches all task deliveries since the last call based on a consumer token.
Parameters:
proj_id (str)
: The unique identifier of the project on SamaHub. It specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters deliveries that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".- `consumer (str, optional)``: Token that identifies the caller, so different consumers can be in different places of the delivered tasks list.
Returns:
- An iterable generator object with task deliveries.
get_status_batch_creation_job
Fetches information about a batch creation job.
Parameters:
proj_id
: The project ID on SamaHub.batch_id
: The ID of the batch to fetch details for.omit_failed_task_data (default=False)
: Whether to omit data about failed tasks.
Returns:
- Batch creation job details.
get_creation_task_schema
Fetches the JSON schema for task creation on SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.
Returns:
- A dictionary containing the JSON schema for task creation.
get_delivery_task_schema
Fetches the JSON schema for task deliveries on SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.
Returns:
- A dictionary containing the JSON schema for task deliveries.
get_project_information
Fetches high-level information about a project from SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.
Returns:
- A dictionary containing information about the project.
get_project_stats
Fetches high-level statistics about a project's tasks within a specified time frame from SamaHub.
Parameters:
project_id (str)
: The project ID on SamaHub.from_timestamp (str, optional)
: Filters tasks that have a date after this timestamp.to_timestamp (str, optional)
: Filters tasks that have a date before this timestamp.
Returns:
- A dictionary containing project statistics.
sama.databricks Client
create_task_batch_from_table
method
Creates a batch of tasks using data from a DataFrame. Each DataFrame column will be used as an input to the task creation, e.g. url='https://wiki.com/img.jpg', client_batch_id='batch1' Prepend 'output_' to column to specify pre-annotations Return JSON - batch_id if successful
Parameters:
spark_dataframe (DataFrame)
: The Spark DataFrame to be converted to task data records.project_id (str)
: The project ID on SamaHub where tasks are to be created.batch_priority (int)
: The priority of the batch. Defaults to 0. Negative numbers indicate higher prioritynotification_email (Union[str, None])
: The email address where SamaHub should send notifications about the batch creation status. Defaults to Nonesubmit (bool)
: Whether to create the tasks in submitted state. Defaults to False
Returns: JSON - batch_id if successful
get_delivered_tasks_to_table
method
This method fetches all deliveries since a given timestamp
Parameters:
spark (SparkSession)
: A spark sessionproject_id (str)
: The unique identifier of the project on SamaHub. Specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, filters deliveries that belong to this batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. Useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".from_timestamp (str, optional)
: Filters deliveries that have a date after this timestamp.task_id (str, optional)
: The unique identifier for a specific task. If provided, fetches deliveries related to this specific task.
Returns:
Returns deliveries in a DataFrame.
get_delivered_tasks_since_last_call_to_table
method
Fetches all deliveries since the last call based on a consumer token.
Parameters:
spark (SparkSession)
: A spark sessionproject_id (str)
: The unique identifier of the project on SamaHub. Specifies the project under which the deliveries reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, filters deliveries that belong to this batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. Useful for filtering deliveries based on client-defined batches.client_batch_id_match_type (str, optional)
: Specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".consumer (str, optional)
: Token that identifies the caller, so different consumers can be in different places of the delivered tasks list.
Returns:
Returns deliveries in a DataFrame.
get_task_status_to_table
Fetches the status and details of a single task. More details can be found in the Sama documentation.
Parameters:
spark (SparkSession)
: A spark sessionproject_id
: The project ID on SamaHub.task_id
: The ID of the task to fetch.same_as_delivery (default=True)
: Whether to fetch the task as it would be delivered.
Returns:
- A DataFrame containing the task status
get_multi_task_status_to_table
Fetches status and details for multiple tasks. More details can be found in the Sama documentation.
Parameters:
spark (SparkSession)
: A spark sessionproject_id (str)
: The unique identifier of the project on SamaHub. This parameter specifies the project under which the tasks reside.batch_id (str, optional)
: The identifier for a batch within the project. If provided, it filters tasks that belong to this specific batch.client_batch_id (str, optional)
: The client-specific identifier for a batch. This is useful for filtering tasks based on client-defined batches.client_batch_id_match_type (str, optional)
: This parameter specifies how theclient_batch_id
should be matched. Common options might include "exact" or "contains".date_type (str, optional)
: Determines which date to use for the timestamp filters. Examples might include "creation_date" or "completion_date".from_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) after this timestamp.to_timestamp (str, optional)
: Filters tasks that have a date (specified bydate_type
) before this timestamp.state (TaskStates, optional)
: An enum value that specifies the desired status of the tasks to filter. For example, "delivered" or "acknowledged".omit_answers (bool, optional)
: Flag to determine if answers related to tasks should be omitted from the response. Defaults to True.
Returns:
- A DataFrame with tasks and their status
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sama-0.5.11.tar.gz
.
File metadata
- Download URL: sama-0.5.11.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddc4bf7a36d7f021193faf80518d533de5e58937ae45963b2e7980f4e995c718 |
|
MD5 | f82ecde9f7ea700f849cc383d2f92af1 |
|
BLAKE2b-256 | 2ff95e340e1d89b7181c622a4b6019a3234e34c4236af10f4231692fe81e1a11 |