Carol Python API and Tools
Project description
PyCarol
PyCarol is a Python SDK designed to support data ingestion and data access workflows on Carol. It provides abstractions for authentication, connector and staging management, data ingestion, and querying, enabling reliable integration with Carol services using Python. The SDK encapsulates low-level API communication and authentication logic, making data pipelines easier to build, maintain, and operate.
Table of Contents
Getting Started
Run pip install pycarol to install the latest stable version from PyPI. Documentation is hosted on Read the Docs.
Recommended authentication method
Never write passwords or API tokens in plain text. Use environment variables whenever possible.
Carol URL format:
www.ORGANIZATION.carol.ai/TENANT_NAME
Explicit authentication methods
Carol is the main object to access pyCarol and Carol APIs.
Using user/password
from pycarol import PwdAuth, Carol
carol = Carol(
domain=TENANT_NAME,
app_name=APP_NAME,
auth=PwdAuth(USERNAME, PASSWORD),
organization=ORGANIZATION
)
Using Tokens
from pycarol import PwdKeyAuth, Carol
carol = Carol(
domain=TENANT_NAME,
app_name=APP_NAME,
auth=PwdKeyAuth(pwd_auth_token),
organization=ORGANIZATION
)
Using API Key
from pycarol import ApiKeyAuth, Carol
carol = Carol(
domain=DOMAIN,
app_name=APP_NAME,
auth=ApiKeyAuth(api_key=X_AUTH_KEY),
connector_id=CONNECTORID,
organization=ORGANIZATION
)
Setting up Carol entities
from pycarol import Connectors
connector_id = Connectors(carol).create(
name="my_connector",
label="connector_label"
)
Sending Data
from pycarol import Staging
Staging(carol).send_data(
staging_name="my_stag",
data=[{"name": "Rafael"}],
connector_id=CONNECTORID
)
Staging batch API: Batch ingestion
To group multiple send_data() calls under one batch (e.g. for Carol to process as a unit), use start_batch() and end_batch(). Each request is tagged with batchId and batchIdSequence. If you do not start a batch explicitly, a batch is auto-started and auto-ended around a single send_data() call.
``Staging.start_batch()``: Starts a batch, generates a batchId, returns it.
``Staging.end_batch()``: Sends the batch summary to Carol and clears the current batch.
``Staging.send_data()``: When a batch is active, appends batchId and batchIdSequence to the intake URL.
from pycarol import Carol, Staging
from dotenv import load_dotenv
load_dotenv()
carol = Carol()
json_ex = [
{"name": "Rafael", "email": {"type": "email", "email": "rafael@totvs.com.br"}},
{"name": "Leandro", "email": {"type": "email", "email": "Leandro@totvs.com.br"}},
]
staging = Staging(carol)
# Single send_data: batch is generated internally
staging.send_data(staging_name="test_batch", data=json_ex, step_size=1,
connector_id=CONNECTORID, print_stats=True)
# User-managed batch for multiple intake calls
staging.start_batch()
staging.send_data(staging_name="test_batch", data=json_ex, step_size=1,
connector_id=CONNECTORID, print_stats=True)
staging.send_data(staging_name="test_batch", data=json_ex, step_size=4,
connector_id=CONNECTORID, print_stats=True)
staging.end_batch()
Reading data
from pycarol import BQ, Carol
BQ(Carol()).query("SELECT * FROM stg_connectorname_tablename")
Carol In Memory
PyCarol provides an easy way to work with in-memory data using the Memory class, built on top of DuckDB. Queries are executed locally over in-memory data, without triggering BigQuery jobs or consuming BigQuery slots, and results are returned as pandas DataFrames. The recommended usage is with BQStorage objects.
from pycarol import Carol, Memory, BQStorage
from dotenv import load_dotenv
load_dotenv()
carol = Carol()
storage = BQStorage(carol)
memory = Memory()
t = storage.query(
"ingestion_stg_connectorname_tablename",
column_names=["tenantid", "processing", "_ingestionDatetime"],
max_stream_count=50
)
memory.add("my_table", t)
table = memory.query("SELECT * FROM my_table")
print(table)
The syntax of Carol In Memory follows DuckDB SQL Syntax.
Logging
Prerequisites
Set LONGTASKID when running locally.
Logging messages to Carol
import logging
from pycarol import CarolHandler, Carol
logger = logging.getLogger(__name__)
logger.addHandler(CarolHandler(Carol()))
logger.info("Hello Carol")
Notes
Logs are linked to long tasks
Console fallback when task ID is missing
Calling Carol APIs
In addition to the high-level abstractions provided by pyCarol, it is also possible to call Carol APIs directly when needed. This is useful for endpoints that are not yet covered by specific SDK methods.
carol.call_api(
"v1/tenantApps/subscribe/carolApps/{carol_app_id}",
method="POST"
)
Settings
from pycarol.apps import Apps
Apps(carol).get_settings(app_name="my_app")
Useful Functions
from pycarol.functions import track_tasks
track_tasks(carol, ["task1", "task2"])
Release process
Open PR to main
Merge after approval
Update README if needed
Made with ❤ at TOTVS IDeIA
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycarol-2.56.13.tar.gz.
File metadata
- Download URL: pycarol-2.56.13.tar.gz
- Upload date:
- Size: 120.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b6024d0ee3f228217553acc6550ca17657b5843ffb741348fd747d18a19edf1
|
|
| MD5 |
6579982552298db5c327091ab0de8741
|
|
| BLAKE2b-256 |
535e859f83c6b285b3ecb102387bbf4f8bd840fa4be100f303dc54519e7d0b18
|
File details
Details for the file pycarol-2.56.13-py3-none-any.whl.
File metadata
- Download URL: pycarol-2.56.13-py3-none-any.whl
- Upload date:
- Size: 137.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b8a768d06a4d07a39c7c369051b34e41d622ea20db349a9781bbfe298b93c55
|
|
| MD5 |
15a4a98cbea53e27efd847a2e99c2efd
|
|
| BLAKE2b-256 |
1ecbfabbfa71b47c5e7f8a4ae83f1b0c667570540f05cf832133a24fda61dbcb
|