Skip to main content

Rebellions Extension for PyTorch

Project description

PyTorch for Rebellions' NPU

This package provides PyTorch integration for Rebellions' NPU.

Getting Started (torch: Python package, rebel-compiler: Python package)

Prerequisites

  • Python 3.9 or later
  • Git
  • CMake 3.18 or later
  • Ninja build system
  • LDAP credentials for Rebellions' package repository

Update Git submodules

Clone submodules recursively. This will download required third-party libraries such as Rebel Compiler headers from third_party/rebel_compiler.

git submodule update --init ./

Create Python Virtual Environment

Create Python virtual environment. This will create a directory named .venv in the current directory.

python3 -m venv ./.venv && source ./.venv/bin/activate

Install Dependencies

Install Python package manager poetry. This will manage dependencies, building, packaging and installing.

pip3 install poetry==2.0.1

Save credentials for https://gate-keeper.rebellions.in. Authorization for https://pypi.rbln.in is required. But, this is not safe way to save credentials. See ~/.config/pypoetry/auth.toml.

export LDAP_USERNAME=daekyeong.kim     # Put your username
export LDAP_PASSWORD=mysecretpassword  # Put your password
poetry config keyring.enabled false    # Optional, if building freezes while auth
poetry config http-basic.rbln-internal $LDAP_USERNAME $LDAP_PASSWORD

NOTE During development we have to use rbln-internal instead of rbln. If you want to download rebel compiler from rbln (external pypi server of rebellions), do the following.

poetry config http-basic.rbln <rbln username> <rbln password>

Install dependencies written in poetry.lock using poetry, except the root package torch-rbln. Be careful, below command uninstall packages which is not on poetry.lock.

poetry sync --no-root

Choose Build Type (Optional)

Choose build type like below. Default is Release.

export RBLN_BUILD_TYPE=Debug

Install Editable Package

Build C++ project and install editable torch-rbln package.

poetry install --only-root

Logging

torch-rbln provides structured logging via spdlog to help diagnose runtime behavior, including CPU fallback operations and device execution traces.

Environment Variables

Variable Description Default
TORCH_RBLN_LOG_LEVEL Controls log verbosity WARNING
TORCH_RBLN_LOG_PATH Log file path (debug builds only) ./torch_rbln.log
export TORCH_RBLN_LOG_LEVEL=INFO
export TORCH_RBLN_LOG_PATH=./torch_rbln.log

A log file is always created in debug builds. Its path can be configured via TORCH_RBLN_LOG_PATH environment variable.

Log Levels

Level Description Use Case
DEBUG Detailed internal states, function entry/exit, parameter values Deep debugging during development (debug builds only)
INFO Runtime information, CPU fallback notifications General development and troubleshooting
WARNING (default) Important warnings that may affect execution Production monitoring
ERROR Errors and critical failures Error tracking and alerting

Debug vs Release Builds

Feature Debug Build Release Build
Minimum log level DEBUG INFO
Log file ✅ Written to TORCH_RBLN_LOG_PATH ❌ Not available
Source location ✅ Included ❌ Omitted
Thread ID ✅ Included ❌ Omitted

Performance Optimization Flag (Optional)

To reduce runtime overhead (e.g., skipping unnecessary NaN/Inf checks), set the following environment variable:

export TORCH_RBLN_DEPLOY=ON

This enables lightweight execution for deployment scenarios.

Device Mapping Configuration

By default, each physical NPU device is mapped to a logical device with a 1:1 relationship (equivalent to RBLN_NPUS_PER_DEVICE=1). This is called Direct Mapping and provides the standard PyTorch device usage experience.

You can configure device mapping using the following environment variables to enable Aggregated Mapping, which groups multiple physical NPUs into a single logical device for RSD (Rebellions Scalable Design) functionality.

RBLN_NPUS_PER_DEVICE

Groups physical NPUs together to create logical devices. Each logical device will contain the specified number of physical NPUs. This is designed for Normal Users who want simple configuration.

Constraint: Must be one of the supported sizes: 1, 2, 4, 8, 16, or 32. These values match the base_sizes defined in rebel/core/compilation/_impl.py for production environments.

export RBLN_NPUS_PER_DEVICE=2

Examples:

With 4 physical devices (RBLN_DEVICES=0,1,2,3 or default):

  • RBLN_NPUS_PER_DEVICE=2rbln:0 maps to NPUs [0, 1], rbln:1 maps to NPUs [2, 3]
  • RBLN_NPUS_PER_DEVICE=4rbln:0 maps to NPUs [0, 1, 2, 3] (full aggregation)

With 6 physical devices and RBLN_NPUS_PER_DEVICE=4:

  • rbln:0 maps to NPUs [0, 1, 2, 3]
  • NPUs [4, 5] remain unused (warning will be displayed)

RBLN_DEVICE_MAP

Provides explicit mapping between logical devices and physical NPU IDs. This is designed for Advanced Users who need fine-grained control over device topology.

Constraint: Each device group must contain one of the supported sizes: 1, 2, 4, 8, 16, or 32 devices.

export RBLN_DEVICE_MAP="[0,1],[2,3,4,5]"

Format: Comma-separated groups of NPU IDs, each group enclosed in square brackets.

Example: With 6 physical devices:

  • RBLN_DEVICE_MAP="[0,1],[2,3,4,5]"rbln:0 maps to NPUs [0, 1], rbln:1 maps to NPUs [2, 3, 4, 5]

Configuration Priority and Conflict Resolution

Priority order: RBLN_DEVICE_MAP > RBLN_NPUS_PER_DEVICE > default (1:1 mapping)

Viewing Device Topology

You can view the current device topology using torch.rbln.device_summary():

import torch_rbln
torch.rbln.device_summary()

Example output:

[RBLN] Device Topology Initialized:
+-------------------+-------------------+----------------------+
| Logical Device    | Physical NPU IDs  | Status               |
+-------------------+-------------------+----------------------+
| rbln:0            | [ 0, 1 ]          | Active (Aggregated)  |
| rbln:1            | [ 2, 3 ]          | Active (Aggregated)  |
+-------------------+-------------------+----------------------+

Tensor Parallel Configuration

The following environment variables control tensor parallel behavior for torch.compile operations and eager mode ops.

TORCH_RBLN_USE_TP_FAILOVER

Enables automatic tensor parallel failover. When a RuntimeError occurs during execution with tensor_parallel_size > 1, the system automatically retries with tp_size=1 on the root NPU of the device group.

This is useful for models that don't support tensor parallelism, allowing them to run on a single NPU within an aggregated device group without manual intervention.

export TORCH_RBLN_USE_TP_FAILOVER=ON   # enable
export TORCH_RBLN_USE_TP_FAILOVER=OFF  # disable (default: OFF)

Behavior:

  • When set to ON and a RuntimeError occurs with tp > 1:
    1. The system logs a warning message indicating the failover attempt
    2. The model is recompiled with tensor_parallel_size=1
    3. Execution continues on the root NPU of the device group
  • When set to OFF or unset (default), RuntimeErrors are propagated as-is

Example scenario: With RBLN_NPUS_PER_DEVICE=4 (4 NPUs per logical device):

  • Initial compilation attempts tp=4
  • If the model doesn't support TP, a RuntimeError occurs
  • With failover enabled, the system retries with tp=1 on NPU 0

TORCH_RBLN_USE_DEVICE_TP

Controls whether eager mode operations use the device group's tensor parallel size instead of tp_size=1.

By default, eager mode ops (operations outside of torch.compile) use tp_size=1. When this environment variable is set to ON, eager mode ops will follow the logical device size defined by RBLN_NPUS_PER_DEVICE or RBLN_DEVICE_MAP, matching the behavior of torch.compile operations.

export TORCH_RBLN_USE_DEVICE_TP=ON   # use device group tp size
export TORCH_RBLN_USE_DEVICE_TP=OFF  # use tp_size=1 for eager ops (default: OFF)

Behavior:

  • When set to ON: Eager mode ops use the device group's tensor parallel size (e.g., tp=4 with RBLN_NPUS_PER_DEVICE=4)
  • When set to OFF or unset (default): Eager mode ops use tp_size=1

Use case: This is useful when you want consistent tensor parallel behavior across both eager and compiled operations, particularly in mixed execution scenarios.

Install Wheel Package (Optional)

If you want to make *.whl and install that, run below command.

poetry build
pip install ./dist/torch_rbln*.whl

When you change C++ or Python source code, you just run Install Editable Package or Install Wheel Package again.

Apply Custom rebel-compiler

You have 2 choices:

  • Use built-in one
  • Use external one

Use torch-rbln built-in rebel-compiler (torch: Python package, rebel-compiler: third_party/rebel_compiler)

This way is strongly recommended. Those are same with Getting Started.

git submodule update --init ./
python3 -m venv ./.venv && source ./.venv/bin/activate
pip3 install poetry==2.0.1
export LDAP_USERNAME=daekyeong.kim     # Put your username
export LDAP_PASSWORD=mysecretpassword  # Put your password
poetry config http-basic.rbln $LDAP_USERNAME $LDAP_PASSWORD

Without poetry sync, checkout rebel-compiler where ./third_party/rebel_compiler to your custom branch.

pushd ./third_party/rebel_compiler
  git checkout my_custom_branch
popd

It will make a package and install into your environment with syncing.

./tools/apply-custom-rebel.sh

Above script edits pyproject.toml and poetry.lock files. If you want to apply custom rebel-compiler temporarily, keep your eyes on those files.

(Optional) You can choose build type like below.

RBLN_BUILD_TYPE=Debug ./tools/apply-custom-rebel.sh

Then, you can build or install torch-rbln package on the custom rebel-compiler package.

poetry install --only-root

Use external rebel-compiler (for rebel-compiler developers)

Prereqs

  • You’ve already built rebel-compiler.
  • ${REBEL_HOME} points to the rebel-compiler repo root.

Method 1: Automated Script (Recommended)

⚠️ Warning: Do not use build-with-external-rebel.sh together with apply-custom-rebel.sh. Both scripts modify pyproject.toml and may cause environment conflicts. Use only one method at a time.

Use the build-with-external-rebel.sh script for automated build:

gcc-13 mode (default): Uses PyTorch from PyPI

cd /path/to/torch-rbln
export REBEL_HOME=/path/to/rebel_compiler
./tools/build-with-external-rebel.sh --clean

gcc-12 mode: Requires pre-built torch wheel

cd /path/to/torch-rbln
export REBEL_HOME=/path/to/rebel_compiler
export RBLN_GCC_VERSION=12
export TORCH_WHEEL_PATH=/path/to/torch-2.8.0-cp310-cp310-linux_x86_64.whl
./tools/build-with-external-rebel.sh --clean

Options:

  • --clean: Clean build artifacts before building
  • --clean-only: Only clean build artifacts, do not build

Environment Variables:

  • REBEL_HOME: Path to rebel-compiler (REQUIRED)
  • RBLN_GCC_VERSION: GCC version to use (12 or 13, default: 13)
  • TORCH_WHEEL_PATH: Path to pre-built torch wheel (REQUIRED for gcc-12, ignored for gcc-13)
  • RBLN_BUILD_TYPE: Build type (Release or Debug, default: Release)
  • RBLN_VENV_PATH: Virtual environment path (default: .venv-rebel)

The script will:

  1. Check Python version compatibility with rebel-compiler
  2. Create virtual environment
  3. Install dependencies
  4. Configure pyproject.toml for external rebel-compiler
  5. Build and install torch-rbln
  6. Verify installation with import tests

After build:

source .venv-rebel/bin/activate
# activate_rebel is auto-sourced, setting REBEL_HOME, PYTHONPATH, LD_LIBRARY_PATH
python -c "import torch; import rebel; import torch_rbln; print('OK')"

Method 2: Manual Setup

1) Create and activate a virtualenv

python3 -m venv .venv
source .venv/bin/activate

2) Add your local rebel-compiler in editable mode

poetry add --editable "${REBEL_HOME}/python"

3) Install this project, using the external compiler

RBLN_USE_EXTERNAL_REBEL_COMPILER=1 poetry install --only-root

Create Git Commit

Git pre-commit hook is working. So, when you create Git commit, linting would be triggered. For prepare linting, you MUST initialize lintrunner.

source ./.venv/bin/activate

lintrunner init

Once lintrunner initalized, no need to initialize again. You can commit now.

git commit

Some failures can be fixed automatically. Run below command for auto fixing.

lintrunner -m main -a

Run Tests

Assume that you are in Python virtual environment, and install torch-rbln package successfully.

C++ Tests

Making package runs in new isolated environment. Although you build your C++ project using poetry install --only-root, can't find that directory. So, for CTest you MUST build C++ project manually.

./tools/build-libtorch-rbln.sh

ctest --test-dir ./build

Python Tests

pytest ./test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

torch_rbln-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl (704.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

torch_rbln-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl (703.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

torch_rbln-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl (701.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

torch_rbln-0.1.7-cp310-cp310-manylinux_2_34_x86_64.whl (700.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file torch_rbln-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_rbln-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 245e49a439903d8d5c5014323748169afd97dd5beef5ca925e466d9ae4d052bf
MD5 414c4be5c952242e37b9f4e08e8d5384
BLAKE2b-256 407a0ca9b01a921e6133d69960b67d13e391ef6daedac9fd15954f3c2179a6d7

See more details on using hashes here.

File details

Details for the file torch_rbln-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_rbln-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2ea211d2d1b5f56297f77cbeb834381cb7c5012861eca3f37a6ea2f9bbb4c76a
MD5 6fa6ac8d7fe6b343882a5dac1d447e23
BLAKE2b-256 257691b698aca639695924a5bd07a49778e76519acf8d24b5c7ffb39a548d642

See more details on using hashes here.

File details

Details for the file torch_rbln-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_rbln-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 63d1e8b553e09f271a9d2038639965ba7995a49db40f2afc92b2dd99baafda88
MD5 dc2f6ed7f35ad931309857665795c7de
BLAKE2b-256 deb87acbe7b6da35d0d49716580c93dc6cea990cc392df44b5a9d940df260df9

See more details on using hashes here.

File details

Details for the file torch_rbln-0.1.7-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_rbln-0.1.7-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a4f0c8ac8214eff2bdf53a9a7d36565aa5691192d86891de677223522354bfa7
MD5 653be58b3fb552adbd366308ee7dfb0b
BLAKE2b-256 2970052905a00b93832612aeeb07bcd12ef5f17f00976c0f50ff509cbd5a24ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page