Skip to main content

Python SDK for the Cerebral data versioning API

Project description

Cerebral Python SDK

Python SDK for the Cerebral data versioning API.

Installation

pip install cerebral-sdk

Or with uv:

uv add cerebral-sdk

Quick Start

Using environment variables (simplest)

export CEREBRAL_API_KEY="your-api-key"
import cerebral

repo = cerebral.repository("my-org/my-repo")
print(repo.description)  # lazy-loaded on first access

with repo.session() as session:
    session.objects.put("data/example.csv", b"col1,col2\na,b\n")
    session.commit("added example csv")

Explicit configuration

import cerebral

cerebral.configure(api_key="your-api-key", endpoint_url="https://custom.endpoint")
repo = cerebral.repository("my-org/my-repo")

Explicit client (most flexible)

from cerebral import Client

with Client(api_key="your-api-key") as client:
    repo = client.repository("my-org/my-repo")
    with repo.session() as session:
        session.objects.put("data/file.csv", b"content")
        session.commit("update data")

Configuration

Option Environment Variable Default
api_key CEREBRAL_API_KEY required
endpoint_url CEREBRAL_ENDPOINT_URL https://cerebral.storage

Resolution order: explicit parameter > environment variable > default.

A missing API key is not an error at construction time; a ConfigurationError is raised when the first request is made.

Usage

Repositories

repo = cerebral.repository("my-org/my-repo")

# Lazy-loaded properties
print(repo.id, repo.description, repo.visibility)

# Update
repo.update(description="New description", visibility="public")

# Delete
repo.delete()

Sessions

Sessions are the primary way to read and write objects. They act like transactions: stage changes, then commit or rollback.

# Context manager (recommended) — rolls back on error
with repo.session() as session:
    objects = session.objects.list(prefix="data/", delimiter="/")
    session.objects.put("data/report.csv", b"content")
    session.commit("update CSV files")

# Explicit control
session = repo.session()
print(session.session_id)
session.objects.put("data/file.csv", b"content")
session.commit("modifying data in parallel")
# or: session.rollback()

Attaching to an Existing Session

Resume a session from another thread, process, or machine:

# In another thread/process/machine:
session = repo.attach(session_id)
session.objects.put("data/file2.csv", b"more content")
session.commit("finishing work")

Objects

with repo.session() as session:
    # Upload
    session.objects.put("data/file.csv", b"content")

    # Download (streaming)
    with session.objects.get("data/file.csv") as f:
        data = f.read()

    # Read a specific byte range (e.g., first 1 KB)
    with session.objects.get("data/file.parquet", byte_range=(0, 1023)) as f:
        header = f.read()

    # Stream large objects without caching
    with session.objects.get("data/large.bin", cache=False) as f:
        for chunk in f.iter_bytes(chunk_size=8192):
            process(chunk)

    # List (auto-paginating, with directory grouping)
    for entry in session.objects.list(prefix="data/", delimiter="/"):
        print(entry.path, entry.type)  # type is "object" or "prefix"

    # Check metadata
    meta = session.objects.head("data/file.csv")
    print(meta.etag, meta.content_type, meta.content_length)

    # Delete
    session.objects.delete("data/file.csv")

    session.commit("object operations")

Uncommitted Changes

session = repo.session()
session.objects.put("data/new.csv", b"content")

for entry in session.uncommitted():
    print(entry.path)

Timeline (Commit History)

for commit in repo.timeline():
    print(commit.id, commit.message)

    # View changes introduced by this commit
    for change in commit.diff():
        print(f"object {change.path} was {change.status} in this commit!")

Organizations

orgs = cerebral.Client(api_key="key").organizations

# Create
org = orgs.create("my-org", "My Organization")

# List
for org in orgs.list():
    print(org.name)

# Members
for member in orgs.members("my-org").list():
    print(member.username, member.role)

orgs.members("my-org").add(user_id="user-uuid", role="member")

Agents

Manage agents and their API keys using the fluent organization resource:

org = cerebral.organization("my-org")

# Create an agent
agent = org.agents.create("my-agent", description="CI bot", metadata={"env": "prod"})
print(agent.name, agent.id)

# List agents
for agent in org.agents.list():
    print(agent.name)

# Get a specific agent
agent = org.agents.get("my-agent")

# Update an agent
agent = org.agents.update("my-agent", description="Updated description")

# Delete an agent
org.agents.delete("my-agent")

Agent API Keys

agent = org.agents.get("my-agent")

# Create a key (token is only shown once)
created = agent.api_keys.create("dev-key")
print(created.token)  # cak-... full token

# List keys
for key in agent.api_keys.list():
    print(key.name, key.token_hint)

# Get by ID and revoke a key
key = agent.api_keys.get(key_id)
key.revoke()

Organization Sub-resources

The organization resource also provides access to repositories, members, groups, policies, and connectors:

org = cerebral.organization("my-org")

for repo in org.repositories.list():
    print(repo.name)

for member in org.members.list():
    print(member.username, member.role)

Groups

groups = client.organizations.groups("my-org")

group = groups.create("engineers", description="Engineering team")
detail = groups.get(group.id)  # includes members and attachments

groups.add_member(group.id, "user", "user-uuid")
groups.remove_member(group.id, "user", "user-uuid")

Policies

policies = client.organizations.policies("my-org")

# Create and validate
result = policies.validate("package cerebral.authz\ndefault allow = true")
policy = policies.create("allow-all", rego="...", description="Allow everything")

# Attach/detach
policies.attach(policy.id, "group", "group-uuid")
policies.detach(policy.id, "group", "group-uuid")

# Effective policies for a user
for ep in policies.effective_policies("user-uuid"):
    print(ep.policy_name, ep.source)

Connectors and Imports

# Org-level connectors
connectors = client.organizations.connectors("my-org")
conn = connectors.create("my-s3", "s3", {"bucket": "my-bucket", "region": "us-east-1"})

# Attach to repo
repo.connectors.attach(conn.id)

# Import
job_id = repo.imports.start(
    connector_id=conn.id,
    source_path="s3://my-bucket/data/",
    destination_path="imported/",
)
status = repo.imports.status(job_id)
print(status.status, status.objects_imported)

Error Handling

All SDK exceptions inherit from CerebralError:

CerebralError                        # base for all SDK errors
+-- ConfigurationError               # missing API key, bad endpoint
+-- TransportError                   # network failures, DNS, timeouts
+-- SerializationError               # invalid JSON in response
+-- APIError                         # base for HTTP API errors
    +-- BadRequestError              # 400
    +-- AuthenticationError          # 401
    +-- ForbiddenError               # 403
    +-- NotFoundError                # 404
    +-- ConflictError                # 409
    +-- GoneError                    # 410
    +-- PreconditionFailedError      # 412
    +-- LockedError                  # 423
    +-- ServerError                  # 5xx

APIError includes status_code, message, code, request_id, method, url, and response_text for debugging.

from cerebral import NotFoundError, CerebralError

try:
    with repo.session() as session:
        with session.objects.get("nonexistent") as f:
            f.read()
except NotFoundError as e:
    print(f"Not found: {e.message} (request_id={e.request_id})")
except CerebralError as e:
    print(f"SDK error: {e}")

Large Object Handling

By default, objects.get() caches the full object in memory on .read(). For large objects, disable caching and stream:

with repo.session() as session:
    with session.objects.get("large-file.bin", cache=False) as f:
        for chunk in f.iter_bytes(chunk_size=1024 * 1024):
            output.write(chunk)

Byte Range Requests

Use byte_range to read only a portion of an object without downloading the full content. This is useful for reading file headers, tailing logs, or formats like Parquet that support random access.

# Read the first 4 bytes (e.g., a magic number)
with repo.objects.get("data/file.parquet", byte_range=(0, 3)) as f:
    magic = f.read()

# Read from offset 1024 to end of file
with repo.objects.get("data/file.parquet", byte_range=(1024, None)) as f:
    tail = f.read()

The reader exposes the content_range property with the server's Content-Range header value (e.g., "bytes 0-3/49152"):

with repo.objects.get("data/file.parquet", byte_range=(0, 1023)) as f:
    data = f.read()
    print(f.content_range)   # "bytes 0-1023/49152"
    print(f.content_length)  # 1024

MCP Server

The SDK includes an MCP server that exposes Cerebral operations as tools for AI agents. The API key must be an agent key (prefix cak-).

Running

# Via uvx
uvx --from cerebral-sdk cerebral-mcp

# Or as a Python module
CEREBRAL_API_KEY=cak-... python -m cerebral.mcp

Available Tools

Tool Description
create_session Create a new editing session on a repository.
list_objects List objects and prefixes with metadata (size, content type, etc.).
head_object Get an object's size and content type without downloading it.
get_object Download an object's content (UTF-8 text or base64-encoded binary).
put_object Upload an object (UTF-8 text or base64-encoded binary).
delete_object Delete an object from a session.
commit_session Commit a session (returns approval URL if review is required).
close_session Roll back and close a session.

Configuration

The server reads CEREBRAL_API_KEY from the environment on every tool call. Only agent keys (cak- prefix) are accepted.

Documentation

Full documentation is available at https://docs.cerebral.storage/python-sdk/.

Development

# Install dev dependencies
uv sync --all-extras

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/cerebral/

# Build
uv build

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cerebral_sdk-0.5.2.tar.gz (182.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cerebral_sdk-0.5.2-py3-none-any.whl (43.4 kB view details)

Uploaded Python 3

File details

Details for the file cerebral_sdk-0.5.2.tar.gz.

File metadata

  • Download URL: cerebral_sdk-0.5.2.tar.gz
  • Upload date:
  • Size: 182.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cerebral_sdk-0.5.2.tar.gz
Algorithm Hash digest
SHA256 6e60b8f63db2f1f385477e0b1314143f93aaac4e96f8f08efa769fc72e6378ea
MD5 c6b4992f8c05ed6b9b0eeff252367530
BLAKE2b-256 9a145ba106aa2416225a0f43458b240db1e619ccd7a187d828f6d324fc58ca09

See more details on using hashes here.

File details

Details for the file cerebral_sdk-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: cerebral_sdk-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 43.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cerebral_sdk-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d6c024ac897c23bde7f8216331b7b89e47890e98220e87868ac73ed2b8e95abe
MD5 fcc9a5a3183b38e00f7f72d0bf2c07a0
BLAKE2b-256 fe4a958c6fa5a4564738c3d7084d7f94438958c6f89af2a8553b71ae0127699d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page