Skip to main content

A PyTorch-compatible deep learning framework

Project description

Candle

Candle

CANN can handle — Not as bright as a torch, but light enough to carry anywhere.

A pure-Python deep learning framework that runs your PyTorch code — no rewrite needed.

License: MIT Python 3.9+ CI GitHub stars

Getting Started | Why Candle | Backends | Roadmap | Contributing

English | 中文


Why Candle

PyTorch is powerful — but it's also 2GB+ of C++ binaries, hard to install on edge devices, and locked to CUDA. Candle takes a different approach:

PyTorch Candle
Install size ~2 GB ~10 MB
Build from source C++ toolchain required pip install candle
Ascend NPU Community fork First-class ACLNN kernels
Apple MPS Partial Native Metal shaders
Run existing import torch code Zero-change drop-in

Getting Started

Install

pip install candle

That's it. No CUDA toolkit, no compiler, no 10-minute build.

Write code the way you already know

import candle as torch
import candle.nn as nn

# Tensors, autograd, nn — all the APIs you're used to
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

x = torch.randn(2, 784, requires_grad=True)
out = model(x)
out.sum().backward()
print(x.grad.shape)  # (2, 784)

Or just import torch — seriously

Candle ships an import hook. Existing PyTorch code runs without changing a single line:

import torch                    # resolved to candle
import torch.nn.functional as F # resolved to candle.nn.functional

x = torch.randn(3, 4)
y = F.relu(x)

How it works:

USE_CANDLE env var PyTorch installed? import torch gives you
1 / true / yes doesn't matter Candle
0 / false / no doesn't matter PyTorch (or ImportError)
not set No Candle
not set Yes PyTorch

If PyTorch isn't installed, Candle picks up automatically. If both are installed, set USE_CANDLE=1:

USE_CANDLE=1 python train.py

Backends

Candle runs on multiple hardware backends with a single API:

candle.device("cpu")    # NumPy — works everywhere
candle.device("cuda")   # NVIDIA GPU
candle.device("mps")    # Apple Silicon GPU (Metal)
candle.device("npu")    # Huawei Ascend (ACLNN)

Ascend NPU — First-Class Support

Unlike wrapper libraries, Candle calls ACLNN large kernels directly via ctypes — no framework overhead, no Python-to-C++ bridge:

import candle as torch

x = torch.randn(1024, 1024, device="npu")
y = torch.matmul(x, x.T)  # runs native ACLNN kernel

Features

  • Pure Python — No C++ extensions. Install in seconds, debug in Python, deploy anywhere.
  • PyTorch-Compatible APITensor, nn.Module, autograd, optim — the full stack.
  • import torch Drop-in — Built-in import hook. Zero code changes for existing projects.
  • Multi-Backend — CPU, CUDA, Apple MPS, Ascend NPU from one codebase.
  • Ascend NPU Native — Direct ACLNN kernel integration, not a bolted-on afterthought.
  • Agentic AI Ready — Lightweight enough to embed in AI agent runtimes.

Roadmap

Most frameworks stop at tensors. Candle doesn't — the end goal is a self-hosting agentic kernel: a system that deploys local models on Ascend, then uses those same models to debug, optimize, and evolve itself.

Phase 1 — Foundation (current)

A PyTorch-compatible pure-Python framework with multi-backend support.

  • Core tensor ops, autograd, nn.Module, optimizers
  • CPU backend (NumPy)
  • Ascend NPU backend (ACLNN native kernels)
  • Apple MPS backend (Metal shaders)
  • import torch zero-change drop-in hook
  • CUDA backend
  • torch.compile graph-mode acceleration
  • Distributed training (DistributedDataParallel)
  • Full TorchVision / TorchAudio model compatibility

Phase 2 — Cognitive Runtime

Local model deployment becomes a first-class primitive, not an afterthought.

  • Local model loading, quantization & serving on Ascend
  • Role-based model router (debug model, generation model, judge model)
  • Multi-model inference policy (local-first, cloud fallback)
  • Self-hosted reasoning & tool-use runtime

Phase 3 — Agentic Kernel

The framework gains the ability to observe, diagnose, and act on itself.

  • Dev Layer — bug detection, repro construction, fix suggestions (powered by local debug model)
  • Bootstrap Layer — candidate generation, distillation, config search (powered by local generation model)
  • ModelOps Layer — evaluation, promotion, rollback, lineage tracking (powered by local judge model)

Phase 4 — Self-Hosting

Local models are no longer just managed artifacts — they become the execution engine of the kernel itself.

  • Local-first agentic execution: Dev / Bootstrap / ModelOps agents default to self-deployed models
  • Continuous self-improvement loop: trace → evaluate → generate fix → test → promote
  • Model-as-kernel: the system uses its own models to improve its own models
┌─────────────────────────────────────────┐
│            Applications                 │
│   train · debug · infer · deploy        │
├─────────────────────────────────────────┤
│          Agentic Kernel                 │
│   Dev Layer · Bootstrap · ModelOps      │
├─────────────────────────────────────────┤
│        Cognitive Runtime                │
│   Local Model RT · Router · Policy      │
├─────────────────────────────────────────┤
│      Intelligence Substrate             │
│   TraceStore · Evaluator · Registry     │
├─────────────────────────────────────────┤
│           Foundation                    │
│   tensor · autograd · nn · compiler     │
│   CPU · CUDA · MPS · Ascend NPU        │
└─────────────────────────────────────────┘

See docs/support-matrix.md for the full 0.1.x op support matrix.

Used By

Building something with Candle? Open an issue and we'll add you here!

Star History

Star History Chart

Contributing

# Clone and install in dev mode
git clone https://github.com/candle-org/candle.git
cd candle
pip install -e ".[test]"

# Run tests
pytest tests/cpu/ tests/contract/ -v --tb=short

# Lint
pip install -e ".[lint]"
pylint src/candle --rcfile=.github/pylint.conf

We welcome contributions! Whether it's new ops, backend support, bug fixes, or docs — open an issue or submit a PR.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

candle_python-0.1.0a1-py3-none-any.whl (563.1 kB view details)

Uploaded Python 3

File details

Details for the file candle_python-0.1.0a1-py3-none-any.whl.

File metadata

File hashes

Hashes for candle_python-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 f63fda31c6b70ddbca82cba4653dbde719827b3dea31d4bacc5b56bc7f09da5a
MD5 3275575105a100c238097dc4b72135ce
BLAKE2b-256 f501de528b17d75136972c7a5df72165eb58621a30286d11d51ec51a731e7fa6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page