A Python SDK for building high-performance, asynchronous batch processing operators

These details have not been verified by PyPI

Project links

Project description

SandAI Operator SDK

Python framework for developing data operators under the Dataflow architecture. Part of the SandAI Data Project's three-layer separation design.

Overview

The Operator SDK provides the foundation for building data processing operators that form the Dataflow layer in the SandAI architecture. These operators are atomic, reusable components that can be composed into complex pipelines and workflows.

For the full documentation set, topic guides, and architecture notes, start with docs/README.md.

Features

Asynchronous Batch Processing: Concurrent processing with configurable batch size and concurrency
Smart File Monitoring: Real-time file change detection with vim editor compatibility
Task Working Directories: Isolated working directories for each task
Error Recovery: Automatic handling of file operations and network interruptions
Standardized Interface: Consistent operator lifecycle and API design
Celery Integration: Built-in support for distributed task execution

Installation

cd operator-sdk
pip install -e .

conda create -n sandai-operator python=3.8 -c conda-forge

Runtime requirement: the SDK supports Python 3.8+. If you enable free-threading/no-GIL on a newer Python version, you may get better CPU parallelism, but it is not required to run the SDK.

Quick Start

from sandai.operator import BatchProcessor, TaskInput, TaskOutput
from pydantic import BaseModel
from typing import List, Generator

class Options(BaseModel):
    param: str = "default"

class Results(BaseModel):
    output: str

processor = BatchProcessor(name="my-processor", version="1.0.0")

@processor.on_batch(
    max_concurrency=4,
    max_batch_size=8,
    prepare_concurrency=4,
    output_concurrency=4,
)
def process_batch(
    batch_inputs: List[TaskInput[Options]], 
    operator_config: dict,
    context
) -> Generator[TaskOutput[Results], None, None]:
    
    for task_input in batch_inputs:
        # Get task working directory
        workdir = context.get_task_workdir(task_input.task_id)
        
        # Your processing logic here
        result = Results(output=f"processed-{task_input.options.param}")
        
        yield TaskOutput[Results](
            task_id=task_input.task_id,
            results=result,
            status="success"
        )

if __name__ == "__main__":
    processor.run()

prepare_concurrency and output_concurrency inherit from max_concurrency by default, so behavior remains backward-compatible when they are not configured. In the current implementation, prepare download/input conversion, output upload/cleanup, and channel pull/push already run on separate executors. That means you can increase prepare_concurrency or output_concurrency independently instead of forcing all IO work to compete in the same pool.

At the Celery protocol layer, the server still uses json encoding by default, but it now accepts both json and msgpack content types by default. The newer operator-client uses msgpack by default, so the two components interoperate out of the box.

If a Celery task is redelivered to another worker after visibility_timeout, the server now also supports limiting delivery attempts through max_delivery_attempts. Like other runtime parameters, you can configure it through BatchProcessor(...), @processor.on_batch(...), or the SANDAI_OPERATOR_CELERY_MAX_DELIVERY_ATTEMPTS environment variable. Once the limit is exceeded, the server marks the task as TaskDeliveryLimitExceededError and stops further redelivery. The default value is 0, which disables this protection.

Core Components

BatchProcessor: Asynchronous batch processor with configurable concurrency
FileChannel: File monitoring with real-time change detection
ProcessingContext: Task-level working directory management
CeleryChannel: Distributed task execution via Celery

Architecture Integration

This SDK enables the Dataflow layer of the SandAI architecture:

Operators built with this SDK are deployed in the operators/ directory
Pipelines in the pipelines/ directory compose these operators
Workflows in the workflows/ directory orchestrate complete business processes

Example Operators

See the operators/ directory for complete implementations:

video-clipper/: Video processing operator
data-transformer/: Data format conversion operator

Testing

make test          # Run all tests
make test-sdk      # Run SDK core tests

Supervisor CLI

operator-sdk provides sdrun for launching multiple identical worker processes, aggregating logs, forwarding signals, and supervising worker lifecycle policies.

sdrun -w 4 --restart always -- python main.py -j --mode file

-w / --worker: number of worker processes to launch, default 1
--restart never: default; do not restart workers after a non-zero exit
--restart always: always restart a worker after a non-zero exit
--restart N: restart a worker at most N times after non-zero exits
--success-exit ignore: default; when a worker exits with code 0, do not affect other workers
--success-exit shutdown: when a worker exits with code 0, stop the remaining workers
--failure-exit ignore: default; when a worker exits non-zero and will not be restarted, do not affect other workers
--failure-exit shutdown: when a worker exits non-zero and will not be restarted, stop the remaining workers and return that worker's exit code
--startup-stagger SECONDS: sequential startup delay, default 0; for example 0.5 starts worker-1 after 0.5s and worker-2 after 1.0s

Policy model:

--restart only controls whether the exited worker itself should be restarted after a non-zero exit.
--success-exit controls whether a clean exit from one worker should stop the rest.
--failure-exit controls whether a non-zero exit from one worker, once no more restarts apply, should stop the rest.
If all workers eventually exit without supervisor-forced shutdown, sdrun exits with the sum of all final worker exit codes.
If --failure-exit shutdown is used, sdrun exits with the first non-restarted failing worker's exit code.
SIGTERM, SIGINT, SIGHUP, and SIGQUIT received by sdrun are forwarded to all workers.
Logs are prefixed with worker identity, for example [worker-2#1][stdout] ....
On POSIX, sdrun starts each worker in its own process group. On Linux it also installs a parent-death signal before exec so workers are terminated if the supervisor disappears unexpectedly.
Child processes receive SDRUN_MODE=true, SDRUN_WORLD_SIZE, SDRUN_RANK, and SDRUN_LOCAL_RANK.

Common combinations:

Independent workers: --restart never --success-exit ignore --failure-exit ignore
Fail-fast workers: --restart never --success-exit ignore --failure-exit shutdown
Elastic recovery on failures: --restart always --success-exit ignore --failure-exit shutdown
First clean completion wins: --restart never --success-exit shutdown --failure-exit shutdown

If sdrun causes GPU memory usage to explode because multiple worker processes each hold their own copy of large tensors or model weights, consider using shared-tensor to share those tensors across processes: https://github.com/world-sim-dev/shared-tensor. This is especially useful for single-GPU, multi-process inference when the model runtime is not thread-safe and threads cannot be used safely.

FileChannel With SDRUN

When workers are launched by sdrun and the operator runs in file mode:

FileChannel shards input lines by line index using line_index % SDRUN_WORLD_SIZE == SDRUN_RANK.
Each worker processes only the JSONL rows assigned to its rank.
Output files are renamed by inserting the rank before the extension, for example output.jsonl becomes output.0.jsonl and output.1.jsonl.
If the output file has no extension, the rank suffix is appended directly to the filename.

This means sdrun -w 4 -- python main.py --mode file ... produces 4 parallel output files that must be merged by the caller if a single combined result is needed.

Development

Set Up Local MinIO

brew install minio/stable/minio
brew install minio/stable/mc
minio server var/minio

Set Up Local Redis

brew install redis
brew services start redis

Set Up Local Postgres

brew install postgresql
brew services start postgresql

List Services

brew services list

Creating New Operators

Create operator directory in ../operators/my-operator/
Implement using this SDK
Deploy as Celery service
Use in pipelines and workflows

Best Practices

Keep operators focused on single responsibilities
Use proper error handling and logging
Implement comprehensive tests
Document operator interfaces clearly

License

MIT License

Build and Upload

make build ossutil cp dist/sandai_operator_sdk-0.2.7-py3-none-any.whl oss://python-artifacts/ -e oss-cn-shanghai.aliyuncs.com --acl public-read

Local Development Install

pip install -e /path/to/operator-sdk

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

Apr 14, 2026

0.6.1

Apr 14, 2026

0.6.1a1 pre-release

Apr 14, 2026

0.6.0

Apr 14, 2026

This version

0.5.1

Apr 13, 2026

0.5.0

Apr 13, 2026

0.4.16

Apr 13, 2026

0.4.15

Apr 13, 2026

0.4.14

Apr 13, 2026

0.4.13

Apr 3, 2026

0.4.12

Mar 31, 2026

0.4.11

Mar 28, 2026

0.4.8

Mar 28, 2026

0.4.7

Mar 28, 2026

0.4.6

Mar 27, 2026

0.4.5

Mar 25, 2026

0.4.4

Mar 25, 2026

0.4.3

Mar 25, 2026

0.4.2

Mar 25, 2026

0.4.0

Mar 24, 2026

0.3.0

Mar 25, 2026

0.2.6

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandai_operator_sdk-0.5.1.tar.gz (85.6 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sandai_operator_sdk-0.5.1-py3-none-any.whl (51.2 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file sandai_operator_sdk-0.5.1.tar.gz.

File metadata

Download URL: sandai_operator_sdk-0.5.1.tar.gz
Upload date: Apr 13, 2026
Size: 85.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sandai_operator_sdk-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`1ca6ef90bc97acd5e2c5a4813378b0e5ee188c6af0f7832081e0ef98fbbb8549`
MD5	`09214753fc406fbe7dc70649abab984f`
BLAKE2b-256	`ae90e17f0bf6d071ecc7d71c32b0b30c4576211fc7117914a5f1dc1229e138a2`

See more details on using hashes here.

File details

Details for the file sandai_operator_sdk-0.5.1-py3-none-any.whl.

File metadata

Download URL: sandai_operator_sdk-0.5.1-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 51.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sandai_operator_sdk-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2030f61767f2785491fa83c8bfee9f128ba94ad96522c932ed60bd2cc6a9d892`
MD5	`a1d29da431c6eddb522ba9b528e2e378`
BLAKE2b-256	`0e0792eadb92cb52a9248a6b6ef59031a5b16adbf0731d4ddd23f6b698251b6a`

See more details on using hashes here.

sandai-operator-sdk 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SandAI Operator SDK

Overview

Features

Installation

Quick Start

Core Components

Architecture Integration

Example Operators

Testing

Supervisor CLI

FileChannel With SDRUN

Development

Creating New Operators

Best Practices

License

Build and Upload

Local Development Install

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes