Skip to main content

Python SDK for the CHARM time-series foundation model — embeddings, forecasting, and a downstream-task toolkit.

Project description

c3-charm

PyPI version License Python

A Python SDK for interacting with the CHARM time-series API. It provides a simple interface for embeddings (multivariate time series → vectors) and forecast/backcast (predict future or reconstruct past steps).

What is CHARM?

CHARM (CHannel Aware Representation Model) is a foundation model specifically designed for multi-variate time series data. It generates high-quality embeddings that capture the semantic essence of time series segments, making them ideal for various downstream applications:

  • Anomaly detection: Identify unusual patterns in time series data
  • Clustering: Group similar time series together
  • Classification: Categorize time series into predefined classes
  • Forecasting: Improve time series predictions
  • Similarity search: Find similar patterns across large datasets

Data shapes and API reference

Use this section to shape your inputs and interpret outputs. The API has two endpoints; both expect the same time series format.

Shared input format (embeddings and forecast/backcast)

Every request uses:

  • descriptions: List of channel names per time series.

    • Type: list[list[str]].
    • Shape: (N, C) — N samples, each with C channel names.
    • Example: [["engine", "temperature"], ["fan", "speed"]] for N=2, C=2.
  • ts_array: List of time series values (one per sample).

    • Type: list[list[list[float]]].
    • Shape: (N, T, C) — N samples, each of T timesteps × C channels.
    • All samples in a single request must have the same T and the same C.
    • Example: one sample with 10 timesteps and 2 channels → a list of 10 rows, each row a list of 2 floats.

Conventions:

  • N = batch size (number of time series in the call).
  • T = timesteps per series (same for all). Must be ≥ 1 and < 1500 (SDK enforces T < 1500).
  • C = channels per series (same for all). Must be < 1500.
  • N × C × T ≤ 500,000 per request (client may split into multiple requests via batching).

1. Embeddings — client.embeddings.create / client.embeddings.async_create

Endpoint: POST {base_url}/predict

Input descriptions (N×C), ts_array (N×T×C). See shared format above.
Output response.embeds: one vector per time series. Shape (N, D) where D = embedding dimension (model-dependent).
Return type EmbeddingsResponse: .embeds, .model, .usage, .raw.

Use return_tensors="list", "np", or "torch" to get lists, a NumPy array, or a PyTorch tensor.

2. Forecast — client.prediction.create / client.prediction.async_create

Endpoint: POST {base_url}/forecast

Input: Same descriptions and ts_array as above, plus:

  • target_len (int, required, non-zero):
    • Positive → forecast that many steps ahead (e.g. 10 = next 10 steps).
    • Negative → backcast that many steps in the past (e.g. -8 = last 8 steps).
Output response.denormalized_predictions: predictions in original scale. Shape (N, abs(target_len), C, Q) where Q = number of quantiles (e.g. 21).
Also response.predictions (normalized), response.data (input echo). Same batch dimension N.
Return type ForecastResponse: .denormalized_predictions, .predictions, .data, .target_len, .mode ("forecast" or "backcast"), .raw.

Use return_tensors="list", "np", or "torch" for all tensor fields.

Quick reference

Functionality Method Input Output shape (main)
Embeddings embeddings.create / async_create descriptions (N×C), ts_array (N×T×C) (N, D)
Forecast prediction.create / async_create Same + target_len > 0 (N, target_len, C, Q)
Backcast prediction.create / async_create Same + target_len < 0 (N, abs(target_len), C, Q)

Installation

Install from PyPI:

pip install c3-charm

To include the downstream-task toolkit (models, trainers, datasets):

pip install c3-charm[toolkit]

Or install from source with Poetry:

git clone https://github.com/c3ai/c3-charm.git
cd c3-charm
poetry install                    # core SDK only
poetry install --with toolkit     # include toolkit dependencies

Dependencies

Core (installed by default):

  • requests — synchronous HTTP client
  • httpx[http2] — asynchronous HTTP/2 client
  • python-dotenv.env file loading
  • tqdm — progress bars

Toolkit (optional, pip install c3-charm[toolkit]):

  • torch, tensordict — tensor operations
  • numpy, pandas — data manipulation
  • matplotlib, seaborn, scienceplots — visualization
  • scikit-learn — ML utilities
  • lightgbm, optuna — gradient boosting & hyperparameter tuning
  • gin-config — experiment configuration

Quick Start

from charm import CharmClient
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Get API key and base URL from environment variables
api_key = os.getenv("CHARM_API_KEY", "your-api-key")
base_url = os.getenv("CHARM_BASE_URL", "http://your-server-url:8080")

# Create a client
client = CharmClient(
    base_url=base_url,
    api_key=api_key,
    timeout=30,         # Increased timeout for potentially large requests
    max_retries=3,      # Automatically retry failed requests
    http2=True,         # Enable HTTP/2 for async requests (default)
)

# Generate embeddings for time series data (synchronous with progress bar)
response = client.embeddings.create(
    descriptions=[["engine", "temperature"], ["fan", "speed"]],
    ts_array=[
        # First time series (10 timesteps, 2 channels)
        [
            [0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.0],
            [1.1, 1.2], [1.3, 1.4], [1.5, 1.6], [1.7, 1.8], [1.9, 2.0]
        ],
        # Second time series (10 timesteps, 2 channels)
        [
            [2.1, 2.2], [2.3, 2.4], [2.5, 2.6], [2.7, 2.8], [2.9, 3.0],
            [3.1, 3.2], [3.3, 3.4], [3.5, 3.6], [3.7, 3.8], [3.9, 4.0]
        ]
    ],
    batch_size=32,      # Process in batches of 32 (for large datasets)
    return_tensors="np", # Options: "list", "np", "torch"
    progress=True       # Show progress bar (default: True)
)

# Access the embeddings
embeddings = response.embeds
print(f"Model: {response.model}")
print(f"Embeddings shape: {embeddings.shape}")

# Asynchronous processing (much faster for large datasets)
import asyncio

async def generate_embeddings_async():
    response = await client.embeddings.async_create(
        descriptions=[["engine", "temperature"], ["fan", "speed"]],
        ts_array=[
            # Same time series data as above
            [
                [0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.0],
                [1.1, 1.2], [1.3, 1.4], [1.5, 1.6], [1.7, 1.8], [1.9, 2.0]
            ],
            [
                [2.1, 2.2], [2.3, 2.4], [2.5, 2.6], [2.7, 2.8], [2.9, 3.0],
                [3.1, 3.2], [3.3, 3.4], [3.5, 3.6], [3.7, 3.8], [3.9, 4.0]
            ]
        ],
        max_B_per_request=32,    # Process 32 time series per API call
        concurrency_per_call=8,  # Run up to 8 concurrent API calls
        return_tensors="np",     # Options: "list", "np", "torch"
        progress=True            # Show progress bar (default: True)
    )
    return response

# Run the async function
response_async = asyncio.run(generate_embeddings_async())

Time Series Forecasting

The CHARM SDK also supports time series forecasting through the /forecast endpoint:

# Forecasting (predict future values)
response = client.prediction.create(
    descriptions=[["sensor_A", "sensor_B"]],
    ts_array=[[
        [1.0, 2.0],
        [1.1, 2.1],
        [1.2, 2.2],
        [1.3, 2.3],
        [1.4, 2.4],
        [1.5, 2.5],
        [1.6, 2.6],
        [1.7, 2.7],
        [1.8, 2.8],
        [1.9, 2.9],
    ]],
    target_len=10,  # Forecast 10 steps ahead
    return_tensors="np"
)

# Access the denormalized predictions
forecast = response.denormalized_predictions
print(f"Forecast shape: {forecast.shape}")  # e.g., (1, 10, 2, Q) where Q is number of quantiles
print(f"Mode: {response.mode}")  # "forecast"

# Backcasting (reconstruct past values)
response = client.prediction.create(
    descriptions=[["sensor_A", "sensor_B"]],
    ts_array=[[
        [1.0, 2.0],
        [1.1, 2.1],
        [1.2, 2.2],
        [1.3, 2.3],
        [1.4, 2.4],
        [1.5, 2.5],
        [1.6, 2.6],
        [1.7, 2.7],
        [1.8, 2.8],
        [1.9, 2.9],
    ]],
    target_len=-8,  # Reconstruct last 8 steps
    return_tensors="np"
)

reconstructed = response.denormalized_predictions
print(f"Mode: {response.mode}")  # "backcast"

Note: target_len is required and must be non-zero:

  • Positive values: Forecast future timesteps (e.g., target_len=10)
  • Negative values: Reconstruct past timesteps (e.g., target_len=-8)

Using a .env file

You can create a .env file in your project directory with the following content:

CHARM_API_KEY=your-api-key
CHARM_BASE_URL=http://your-server-url:8080

This allows you to keep your credentials separate from your code and avoid hardcoding sensitive information.

Features

  • OpenAI-style SDK for CHARM time-series embeddings
  • API key authentication
  • Automatic retries with exponential backoff
  • Configurable timeouts
  • Client-side batching for large datasets
  • Flexible return types (Python lists, NumPy arrays, or PyTorch tensors)
  • Both synchronous and asynchronous methods in a single client:
    • client.embeddings.create() - Synchronous method with progress tracking
    • await client.embeddings.async_create() - Asynchronous method with concurrent batch processing
    • client.prediction.create() - Synchronous prediction method
    • await client.prediction.async_create() - Asynchronous prediction method
  • Progress tracking with tqdm for both sync and async methods
  • HTTP/2 support for asynchronous requests
  • Comprehensive error handling with specific exception types
  • Binary protocol for efficient data transfer (handles raw fp16 bytes from server)

Performance Considerations

  • Synchronous Method (client.embeddings.create): Suitable for smaller datasets or when simplicity is preferred. Processes batches sequentially, which can be slow for large datasets. Now includes progress tracking with tqdm. Avoid sending very large batches (>100 samples) in a single request to prevent timeouts.

  • Asynchronous Method (client.embeddings.async_create): Recommended for large datasets. Significantly faster due to concurrent processing with features like:

    • Parallel batch processing
    • Bounded concurrency to avoid overwhelming the server
    • Progress tracking for long-running operations
    • HTTP/2 support for efficient connections

Payload limitations

The SDK and API enforce:

  • Timesteps per series: T ≥ 1 and T < 1500 (enforced by SDK).
  • Channels per series: C < 1500 (see usage guide).
  • Per-request size: N × C × T ≤ 500,000 (client-side batching can split larger jobs).
  • Batch consistency: All time series in a single request must have the same T and the same C.

See the Data shapes and API reference section above for input/output shapes.

Requirements

Testing

The CHARM SDK uses pytest for testing. To run the tests:

# Install pytest if not already installed
pip install pytest

# Run all tests
python -m pytest tests/

# Run specific test file
python -m pytest tests/test_utils.py

# Run with verbose output
python -m pytest -v tests/

Documentation

For detailed documentation, see the examples directory, the usage guide, the quickstart guide, and the docstrings in the code.

Example Applications

The CHARM SDK can be used for various time series applications:

  1. Anomaly Detection: Identify unusual patterns in sensor data, network traffic, or financial transactions
  2. Time Series Clustering: Group similar time series patterns for market segmentation or behavior analysis
  3. Classification: Categorize time series data for predictive maintenance or activity recognition
  4. Similarity Search: Find similar patterns across large datasets for pattern discovery
  5. Forecasting: Predict future values or reconstruct past values in time series data

Check out the notebooks in the docs/notebooks directory for detailed examples of these applications.

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

c3_charm-0.1.0.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

c3_charm-0.1.0-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file c3_charm-0.1.0.tar.gz.

File metadata

  • Download URL: c3_charm-0.1.0.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.12 Darwin/24.6.0

File hashes

Hashes for c3_charm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6b478328f8a11fa91169883994f4c848ac8adf0717428a1616ba17aa8344d35
MD5 8a3e6db6b6bf7913a2bf7eff69fcd001
BLAKE2b-256 303c6f1f7cc348fc191c62e4c17231403dd878898ba47992d24ec1b8a64f9eea

See more details on using hashes here.

File details

Details for the file c3_charm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: c3_charm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.12 Darwin/24.6.0

File hashes

Hashes for c3_charm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a78333f195a2978a832df4e0816b6863c2abbc9c39280dff5064cfa4c5b53ed
MD5 6a684a7db1aa2c9a4fb6c415da1b9053
BLAKE2b-256 520fe4f6a261c8fcd5f06d8685fa865ebc9a3319b2732b60aa5d8d3cd540f358

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page