Skip to main content

Python SDK for the CHARM time-series foundation model — embeddings, forecasting, and a downstream-task toolkit.

Project description

c3-charm

PyPI version License Python

A Python SDK for the CHARM time-series foundation model. Provides embeddings (multivariate time series → vectors), forecast/backcast (quantile predictions), and a toolkit for downstream tasks (anomaly detection, retrieval, classification, reconstruction, forecasting).

What is CHARM?

CHARM (CHannel Aware Representation Model) is a foundation model for multivariate time series. It ingests windows of (T, C) data — T timesteps, C channels — and produces dense embeddings that capture temporal patterns and cross-channel relationships. Channel names (descriptions) are part of the input, making the model channel-aware.

No scaling required — the model handles normalization internally. Send raw data directly.


Installation

pip install c3-charm            # core SDK only (embeddings + forecast)
pip install c3-charm[toolkit]   # includes PyTorch models, datasets, trainers

Or from source:

git clone https://github.com/c3ai/c3-charm.git
cd c3-charm
poetry install                    # core SDK only
poetry install --with toolkit     # include toolkit dependencies

Core SDK

Client initialization

from charm import CharmClient

client = CharmClient(
    base_url="http://your-server:8080",
    api_key="your-api-key",      # or set CHARM_API_KEY env var
    timeout=300,
    max_retries=3,
)

Embeddings — client.embeddings.create()

Converts time series windows into dense vectors.

response = client.embeddings.create(
    descriptions=[["sensor_A", "sensor_B"]],  # (N, C) channel names
    ts_array=[[[1.0, 2.0], [1.1, 2.1], ...]],  # (N, T, C) values
    batch_size=32,
    return_tensors="np",       # "list", "np", or "torch"
    aggregate=True,            # True → (N, D); False → (N, T_, C, D)
    progress=True,
)
embeddings = response.embeds  # shape (N, D) when aggregate=True

aggregate parameter:

  • True (default): Returns flattened embeddings (N, D) — one vector per series. Best for retrieval, classification, clustering.
  • False: Returns per-patch, per-channel embeddings (N, T_, C, D) where T_ = T / patch_size. Best for fine-grained tasks or custom heads.

Async (faster for large datasets):

response = await client.embeddings.async_create(
    descriptions=descriptions,
    ts_array=ts_array,
    max_B_per_request=32,
    concurrency_per_call=8,
    return_tensors="np",
    aggregate=True,
)

Forecast / Backcast — client.prediction.create()

Zero-shot quantile predictions — no training required.

response = client.prediction.create(
    descriptions=[["sensor_A", "sensor_B"]],
    ts_array=[[[1.0, 2.0], [1.1, 2.1], ...]],
    target_len=10,       # positive = forecast, negative = backcast
    return_tensors="np",
)
forecast = response.denormalized_predictions  # (N, 10, C, Q) — Q quantiles
median = response.median                      # (N, 10, C) — point forecast

Backcast (reconstruct past values):

response = client.prediction.create(
    descriptions=descriptions,
    ts_array=ts_array,
    target_len=-8,  # reconstruct last 8 steps
    return_tensors="np",
)

Input constraints

Constraint Limit
Timesteps per series 1 ≤ T < 1500
Channels per series C < 1500
Per-request size N × C × T ≤ 500,000
Batch consistency All series in a request must share the same T and C
Minimum for good embeddings T ≥ 32 (model patch size)

The SDK handles client-side batching automatically when you set batch_size (sync) or max_B_per_request (async).

Output shapes

Method Output field Shape
embeddings.create(aggregate=True) response.embeds (N, D)
embeddings.create(aggregate=False) response.embeds (N, T_, C, D)
prediction.create(target_len > 0) response.denormalized_predictions (N, target_len, C, Q)
prediction.create(target_len < 0) response.denormalized_predictions (N, abs(target_len), C, Q)
prediction.create(...) response.median (N, abs(target_len), C)

Channel descriptions

Descriptions are required and affect embedding quality. They tell the model what each channel represents.

Good descriptions — use meaningful, consistent names:

descriptions = [["engine_temperature", "oil_pressure", "rpm"]]

Acceptable — short but informative:

descriptions = [["temp", "pressure", "speed"]]

Avoid — generic or positional names reduce model effectiveness:

descriptions = [["col_0", "col_1", "col_2"]]  # works but suboptimal

When working with pandas DataFrames, use column names directly:

descriptions = [df.columns.tolist()] * N

Scaling

No pre-processing needed. CHARM normalizes internally. Send raw data as-is. Do not apply StandardScaler, MinMaxScaler, or log transforms before calling the API.

Error handling

from charm import CharmError, AuthenticationError, InvalidRequestError, RateLimitError

try:
    response = client.embeddings.create(...)
except AuthenticationError:
    # bad API key
except InvalidRequestError as e:
    # shape violations, empty input
except RateLimitError:
    # back off and retry
except CharmError as e:
    # catch-all for other SDK errors

Toolkit — Downstream Tasks

The toolkit (pip install c3-charm[toolkit]) provides PyTorch models, dataset utilities, and training infrastructure for fine-tuning on top of CHARM embeddings.

Retrieval — charm_toolkit.retrieval

Find similar time series by embedding similarity.

from charm_toolkit.retrieval import (
    l2_normalize,
    cosine_similarity_matrix,
    knn_search,
    retrieval_metrics,
)

# Embed your data
response = client.embeddings.create(
    descriptions=descriptions,
    ts_array=windows_list,
    return_tensors="np",
)
embeddings = response.embeds  # (N, D)

# Similarity search
sim = cosine_similarity_matrix(embeddings, embeddings)

# kNN search
indices, scores = knn_search(query_emb, corpus_emb, k=5)

# Evaluation metrics
metrics = retrieval_metrics(
    query_emb=query_emb,
    corpus_emb=corpus_emb,
    query_labels=query_labels,
    corpus_labels=corpus_labels,
    k_values=[1, 3, 5, 10],
    exclude_self=True,
    query_ids=query_dataset_names,
    corpus_ids=corpus_dataset_names,
)
# Returns: precision@k, ndcg@k, hit_rate@k

Anomaly Detection — charm_toolkit.anomaly_detection

Detect anomalies via kNN distance scoring on windowed CHARM embeddings.

from charm_toolkit.anomaly_detection import (
    sliding_window_embeddings,
    knn_anomaly_scores,
    window_scores_to_pointwise,
)

# 1. Embed sliding windows
train_emb = sliding_window_embeddings(
    client, train_data, descriptions,
    window_size=128, stride=1, batch_size=64,
)
test_emb = sliding_window_embeddings(
    client, test_data, descriptions,
    window_size=128, stride=1, batch_size=64,
)

# 2. Score test windows by distance to train
window_scores = knn_anomaly_scores(
    test_emb=test_emb,
    reference_emb=train_emb,
    k=5,
    distance="cosine",    # "cosine", "l2", "l1"
    aggregation="mean",   # "mean", "max"
)

# 3. Aggregate to per-timestep scores
pointwise_scores = window_scores_to_pointwise(
    window_scores=window_scores,
    window_size=128,
    stride=1,
    total_length=len(test_data),
    method="mean",  # "mean", "max", "last", "center"
)

Pointwise aggregation methods:

Each timestep is covered by multiple overlapping windows. The method parameter controls how to assign a single score per timestep:

Method Behavior Use case
"mean" Average of all windows covering the point Smooth, best for offline evaluation
"max" Max score among covering windows Conservative, catches isolated spikes
"last" Score of the most recently completed window Online/streaming — score only updates when a window finishes processing
"center" Score of the window centered on each point Minimal time-shift, tightest temporal alignment

ReconstructionModel — anomaly detection via learned head

from charm_toolkit import (
    ReconstructionModel, create_reconstruction_datasets,
    collator, TrainerClass,
)
from torch.utils.data import DataLoader
import torch.nn as nn

train_ds, val_ds, test_ds = create_reconstruction_datasets(
    raw_data,           # (T, C) numpy array or torch tensor
    descriptions=channel_names,
    window_size=256,
    stride=1,
    train_ratio=0.7,
    val_ratio=0.15,
    sequential=True,
    scale=True,
)

model = ReconstructionModel(
    embedding_client=client,
    reconstructor="linear",  # "linear", "mlp", or custom nn.Module
    hidden_dim=128,
    dropout=0.1,
)

trainer = TrainerClass(
    model=model,
    train_loader=DataLoader(train_ds, batch_size=512, collate_fn=collator),
    val_loader=DataLoader(val_ds, batch_size=512, collate_fn=collator),
    epochs=1000,
    patience=5,
    lr=1e-3,
    criterion=nn.HuberLoss(),
)
trainer.fit()

ForecastingModel — embedding-based forecasting

from charm_toolkit import ForecastingModel, create_forecasting_datasets, collator, TrainerClass
from torch.utils.data import DataLoader

train_ds, val_ds, test_ds = create_forecasting_datasets(
    raw_data,
    descriptions=channel_names,
    train_horizon=96,
    test_horizon=96,
    train_ratio=0.7,
    val_ratio=0.15,
    sequential=True,
    scale=True,
)

model = ForecastingModel(
    embedding_client=client,
    horizon=96,
    input_size=96,
    head="linear",
    hidden_dim=128,
    mode="last",         # "last", "avg", "none"
    per_channel=True,
    num_channels=len(channel_names),
)

trainer = TrainerClass(
    model=model,
    train_loader=DataLoader(train_ds, batch_size=512, collate_fn=collator),
    val_loader=DataLoader(val_ds, batch_size=512, collate_fn=collator),
    epochs=1000,
    patience=10,
    lr=1e-2,
)
trainer.fit()

ClassificationModel — time series classification

from charm_toolkit import ClassificationModel, create_classification_datasets, collator, TrainerClass
from torch.utils.data import DataLoader
import torch.nn as nn

train_ds, val_ds, test_ds = create_classification_datasets(
    raw_data,          # (N, T, C)
    labels=labels,     # list of N integer labels
    descriptions=channel_names,
    train_ratio=0.7,
    val_ratio=0.15,
)

model = ClassificationModel(
    embedding_client=client,
    num_classes=num_classes,
    hidden_dim=128,
    pooling_over_t="mean",
    pooling_over_channels="mean",
    classifier_type="mlp",
)

trainer = TrainerClass(
    model=model,
    train_loader=DataLoader(train_ds, batch_size=32, collate_fn=collator),
    val_loader=DataLoader(val_ds, batch_size=32, collate_fn=collator),
    epochs=100,
    patience=10,
    lr=1e-3,
    criterion=nn.CrossEntropyLoss(),
)
trainer.fit()

Precomputing embeddings (critical for training)

Toolkit models call the API every forward pass. For training with hundreds of windows per epoch, precompute embeddings once:

from charm_toolkit import precompute_dataset_embeddings, PrecomputedEmbeddingsDataset

# Compute once, save to disk as memmap
train_shape = precompute_dataset_embeddings(
    client=client, dataset=train_ds,
    output_path="./outputs/train_embeddings.pt", memory_batch_size=8192
)
val_shape = precompute_dataset_embeddings(
    client=client, dataset=val_ds,
    output_path="./outputs/val_embeddings.pt", memory_batch_size=8192
)

# Wrap datasets — model skips API calls when "embeds" key present
train_ds = PrecomputedEmbeddingsDataset(train_ds, "./outputs/train_embeddings.pt", train_shape)
val_ds = PrecomputedEmbeddingsDataset(val_ds, "./outputs/val_embeddings.pt", val_shape)

# Training now uses cached embeddings — orders of magnitude faster
train_loader = DataLoader(train_ds, batch_size=512, shuffle=True, collate_fn=collator)

Trainer API

from charm_toolkit import TrainerClass

trainer = TrainerClass(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    test_loader=test_loader,     # optional
    lr=1e-3,
    weight_decay=1e-4,
    epochs=1000,
    patience=5,
    min_delta=1e-4,
    max_grad_norm=5.0,
    criterion=None,              # defaults to MSELoss
)
trainer.fit()
test_loss = trainer.evaluate(test_loader)

Dataset factory functions

All return (train_dataset, val_dataset, test_dataset):

Function Input shape Key args
create_reconstruction_datasets(raw_data, ...) (T, C) window_size, stride, train_ratio, val_ratio
create_forecasting_datasets(raw_data, ...) (T, C) train_horizon, test_horizon, stride, train_ratio, val_ratio
create_classification_datasets(raw_data, labels, ...) (N, T, C) train_ratio, val_ratio

Reconstruction and forecasting expect a single long time series (T, C) split temporally. Classification expects pre-windowed (N, T, C).

collator

All DataLoaders using toolkit datasets require collator as the collate_fn:

from charm_toolkit import collator
# or equivalently:
from charm_toolkit.Datasets import collator

Embeddings as features

CHARM embeddings work as drop-in feature vectors for any sklearn model:

import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.linear_model import LogisticRegression
from charm_toolkit.retrieval import cosine_similarity_matrix

response = client.embeddings.create(
    descriptions=descriptions,
    ts_array=windows_list,
    return_tensors="np",
)
X = response.embeds  # (N, D)

# Anomaly detection with isolation forest
clf = IsolationForest(contamination=0.05)
anomaly_labels = clf.fit_predict(X)

# Similarity search
sim = cosine_similarity_matrix(X, X)

# As features for any classifier
clf = LogisticRegression().fit(X_train, y_train)

Local Deployment

Deploy models locally from GitHub releases — no remote server needed:

with CharmClient(tag="experiment-2026-03-15_10-30-00") as client:
    response = client.embeddings.create(...)
# Server shuts down automatically

When tag is provided:

  1. Checks for GPU availability (falls back to CPU)
  2. Clones repo at the specified tag (shallow clone)
  3. Downloads model weights from the GitHub release
  4. Launches the serving stack locally
  5. Polls health endpoint until ready

Files cached at ~/.charm/models/<tag>/ for fast subsequent runs.

CharmClient(
    tag="experiment-tag",           # required for local mode
    repo_url="https://...",         # default: c3-e/research
    cache_dir="/path/to/cache",     # default: ~/.charm/models
    port=8080,                      # 0 = auto-select
)

Decision guide

When to use CHARM

  • Multivariate time series (multiple channels measured over time)
  • Each window has at least ~32 timesteps (model patch size)
  • You want a strong starting point without feature engineering

When to use classical methods instead

  • Tabular data without a time dimension — use LightGBM, XGBoost
  • Very short series (< 10 timesteps)
  • Single scalar features — still works but may not outperform ARIMA/ETS

Zero-shot vs fine-tuned

Approach When Effort
prediction.create(target_len=H) Quick forecast baseline, no labeled data None — one API call
Embeddings + sklearn Moderate data, combine with other features Minutes
Embeddings + kNN (retrieval/AD) Unlabeled anomaly detection or search Minutes
Toolkit model (Reconstruction/Forecasting/Classification) Have labeled data, want best performance Train a small head (~minutes on CPU)

Testing

pip install pytest
python -m pytest tests/
python -m pytest tests/test_utils.py -v

Documentation

The full API reference and usage guide is this README — it renders on the PyPI page.

License

Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

c3_charm-0.1.3.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

c3_charm-0.1.3-py3-none-any.whl (3.3 MB view details)

Uploaded Python 3

File details

Details for the file c3_charm-0.1.3.tar.gz.

File metadata

  • Download URL: c3_charm-0.1.3.tar.gz
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for c3_charm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 fcd4749d463fe13a75ed1dae00a9f2a463dcda5453fa9e46863af20ac163e16f
MD5 2020a8ad4ea047d5fb857187b3459f59
BLAKE2b-256 b9f8f9158fab0b968aac30e235c4af1653141fda9c2caa992c1f9f6d3fb86ca4

See more details on using hashes here.

Provenance

The following attestation bundles were made for c3_charm-0.1.3.tar.gz:

Publisher: charm-publish.yml on c3-e/research

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file c3_charm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: c3_charm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for c3_charm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0963051ea73434350e6cb51aed5af55b37c253bcd8e7891436728fb6ad50cf2e
MD5 b732e83e5829ee4e8445cd3f872f44ef
BLAKE2b-256 7ba6b9f9d4c44edf893a2d337e4b1a8eaca0dd664d4ab51579dd2569216a8350

See more details on using hashes here.

Provenance

The following attestation bundles were made for c3_charm-0.1.3-py3-none-any.whl:

Publisher: charm-publish.yml on c3-e/research

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page