Rebellions Extension for PyTorch
Project description
PyTorch for Rebellions' NPU
This package provides PyTorch integration for Rebellions' NPU.
Getting Started (torch: Python package, rebel-compiler: Python package)
Prerequisites
- Python 3.9 or later
- Git
- CMake 3.18 or later
- Ninja build system
- LDAP credentials for Rebellions' package repository
Update Git submodules
Clone submodules recursively. This will download required third-party libraries
such as Rebel Compiler headers from third_party/rebel_compiler.
git submodule update --init ./
Create Python Virtual Environment
Create Python virtual environment. This will create a directory named .venv in
the current directory.
python3 -m venv ./.venv && source ./.venv/bin/activate
Install Dependencies
Install Python package manager poetry. This will manage dependencies,
building, packaging and installing.
pip3 install poetry==2.0.1
Save credentials for https://gate-keeper.rebellions.in. Authorization for
https://pypi.rbln.in is required. But, this is not safe way to save
credentials. See ~/.config/pypoetry/auth.toml.
export LDAP_USERNAME=daekyeong.kim # Put your username
export LDAP_PASSWORD=mysecretpassword # Put your password
poetry config keyring.enabled false # Optional, if building freezes while auth
poetry config http-basic.rbln-internal $LDAP_USERNAME $LDAP_PASSWORD
NOTE During development we have to use rbln-internal instead of rbln. If you want to download rebel compiler from rbln (external pypi server of rebellions), do the following.
poetry config http-basic.rbln <rbln username> <rbln password>
Install dependencies written in poetry.lock using poetry, except the root
package torch-rbln. Be careful, below command uninstall packages which is not
on poetry.lock.
poetry sync --no-root
Choose Build Type (Optional)
Choose build type like below. Default is Release.
export RBLN_BUILD_TYPE=Debug
Install Editable Package
Build C++ project and install editable torch-rbln package.
poetry install --only-root
Logging
torch-rbln provides structured logging via spdlog to help diagnose runtime behavior, including CPU fallback operations and device execution traces.
Environment Variables
| Variable | Description | Default |
|---|---|---|
TORCH_RBLN_LOG_LEVEL |
Controls log verbosity | WARNING |
TORCH_RBLN_LOG_PATH |
Log file path (debug builds only) | ./torch_rbln.log |
export TORCH_RBLN_LOG_LEVEL=INFO
export TORCH_RBLN_LOG_PATH=./torch_rbln.log
A log file is always created in debug builds. Its path can be configured via TORCH_RBLN_LOG_PATH environment variable.
Log Levels
| Level | Description | Use Case |
|---|---|---|
DEBUG |
Detailed internal states, function entry/exit, parameter values | Deep debugging during development (debug builds only) |
INFO |
Runtime information, CPU fallback notifications | General development and troubleshooting |
WARNING (default) |
Important warnings that may affect execution | Production monitoring |
ERROR |
Errors and critical failures | Error tracking and alerting |
Debug vs Release Builds
| Feature | Debug Build | Release Build |
|---|---|---|
| Minimum log level | DEBUG |
INFO |
| Log file | ✅ Written to TORCH_RBLN_LOG_PATH |
❌ Not available |
| Source location | ✅ Included | ❌ Omitted |
| Thread ID | ✅ Included | ❌ Omitted |
Performance Optimization Flag (Optional)
To reduce runtime overhead (e.g., skipping unnecessary NaN/Inf checks), set the following environment variable:
export TORCH_RBLN_DEPLOY=ON
This enables lightweight execution for deployment scenarios.
Device Mapping Configuration
By default, each physical NPU device is mapped to a logical device with a 1:1 relationship (equivalent to RBLN_NPUS_PER_DEVICE=1). This is called Direct Mapping and provides the standard PyTorch device usage experience.
You can configure device mapping using the following environment variables to enable Aggregated Mapping, which groups multiple physical NPUs into a single logical device for RSD (Rebellions Scalable Design) functionality.
RBLN_NPUS_PER_DEVICE
Groups physical NPUs together to create logical devices. Each logical device will contain the specified number of physical NPUs. This is designed for Normal Users who want simple configuration.
Constraint: Must be one of the supported sizes: 1, 2, 4, 8, 16, or 32. These values match the base_sizes defined in rebel/core/compilation/_impl.py for production environments.
export RBLN_NPUS_PER_DEVICE=2
Examples:
With 4 physical devices (RBLN_DEVICES=0,1,2,3 or default):
RBLN_NPUS_PER_DEVICE=2→rbln:0maps to NPUs [0, 1],rbln:1maps to NPUs [2, 3]RBLN_NPUS_PER_DEVICE=4→rbln:0maps to NPUs [0, 1, 2, 3] (full aggregation)
With 6 physical devices and RBLN_NPUS_PER_DEVICE=4:
rbln:0maps to NPUs [0, 1, 2, 3]- NPUs [4, 5] remain unused (warning will be displayed)
RBLN_DEVICE_MAP
Provides explicit mapping between logical devices and physical NPU IDs. This is designed for Advanced Users who need fine-grained control over device topology.
Constraint: Each device group must contain one of the supported sizes: 1, 2, 4, 8, 16, or 32 devices.
export RBLN_DEVICE_MAP="[0,1],[2,3,4,5]"
Format: Comma-separated groups of NPU IDs, each group enclosed in square brackets.
Example: With 6 physical devices:
RBLN_DEVICE_MAP="[0,1],[2,3,4,5]"→rbln:0maps to NPUs [0, 1],rbln:1maps to NPUs [2, 3, 4, 5]
Configuration Priority and Conflict Resolution
Priority order: RBLN_DEVICE_MAP > RBLN_NPUS_PER_DEVICE > default (1:1 mapping)
Viewing Device Topology
You can view the current device topology using torch.rbln.device_summary():
import torch_rbln
torch.rbln.device_summary()
Example output:
[RBLN] Device Topology Initialized:
+-------------------+-------------------+----------------------+
| Logical Device | Physical NPU IDs | Status |
+-------------------+-------------------+----------------------+
| rbln:0 | [ 0, 1 ] | Active (Aggregated) |
| rbln:1 | [ 2, 3 ] | Active (Aggregated) |
+-------------------+-------------------+----------------------+
Tensor Parallel Configuration
The following environment variables control tensor parallel behavior for torch.compile operations and eager mode ops.
TORCH_RBLN_USE_TP_FAILOVER
Enables automatic tensor parallel failover. When a RuntimeError occurs during execution with tensor_parallel_size > 1, the system automatically retries with tp_size=1 on the root NPU of the device group.
This is useful for models that don't support tensor parallelism, allowing them to run on a single NPU within an aggregated device group without manual intervention.
export TORCH_RBLN_USE_TP_FAILOVER=ON # enable
export TORCH_RBLN_USE_TP_FAILOVER=OFF # disable (default: OFF)
Behavior:
- When set to ON and a RuntimeError occurs with
tp > 1:- The system logs a warning message indicating the failover attempt
- The model is recompiled with
tensor_parallel_size=1 - Execution continues on the root NPU of the device group
- When set to OFF or unset (default), RuntimeErrors are propagated as-is
Example scenario:
With RBLN_NPUS_PER_DEVICE=4 (4 NPUs per logical device):
- Initial compilation attempts
tp=4 - If the model doesn't support TP, a RuntimeError occurs
- With failover enabled, the system retries with
tp=1on NPU 0
TORCH_RBLN_USE_DEVICE_TP
Controls whether eager mode operations use the device group's tensor parallel size instead of tp_size=1.
By default, eager mode ops (operations outside of torch.compile) use tp_size=1. When this environment variable is set to ON, eager mode ops will follow the logical device size defined by RBLN_NPUS_PER_DEVICE or RBLN_DEVICE_MAP, matching the behavior of torch.compile operations.
export TORCH_RBLN_USE_DEVICE_TP=ON # use device group tp size
export TORCH_RBLN_USE_DEVICE_TP=OFF # use tp_size=1 for eager ops (default: OFF)
Behavior:
- When set to ON: Eager mode ops use the device group's tensor parallel size (e.g.,
tp=4withRBLN_NPUS_PER_DEVICE=4) - When set to OFF or unset (default): Eager mode ops use
tp_size=1
Use case: This is useful when you want consistent tensor parallel behavior across both eager and compiled operations, particularly in mixed execution scenarios.
Install Wheel Package (Optional)
If you want to make *.whl and install that, run below command.
poetry build
pip install ./dist/torch_rbln*.whl
When you change C++ or Python source code, you just run
Install Editable Package or Install Wheel Package again.
Apply Custom rebel-compiler
You have 2 choices:
- Use built-in one
- Use external one
Use torch-rbln built-in rebel-compiler (torch: Python package, rebel-compiler: third_party/rebel_compiler)
This way is strongly recommended. Those are same with Getting Started.
git submodule update --init ./
python3 -m venv ./.venv && source ./.venv/bin/activate
pip3 install poetry==2.0.1
export LDAP_USERNAME=daekyeong.kim # Put your username
export LDAP_PASSWORD=mysecretpassword # Put your password
poetry config http-basic.rbln $LDAP_USERNAME $LDAP_PASSWORD
Without poetry sync, checkout rebel-compiler where
./third_party/rebel_compiler to your custom branch.
pushd ./third_party/rebel_compiler
git checkout my_custom_branch
popd
It will make a package and install into your environment with syncing.
./tools/apply-custom-rebel.sh
Above script edits pyproject.toml and poetry.lock files. If you want to
apply custom rebel-compiler temporarily, keep your eyes on those files.
(Optional) You can choose build type like below.
RBLN_BUILD_TYPE=Debug ./tools/apply-custom-rebel.sh
Then, you can build or install torch-rbln package on the custom
rebel-compiler package.
poetry install --only-root
Use external rebel-compiler (for rebel-compiler developers)
Prereqs
- You’ve already built
rebel-compiler. ${REBEL_HOME}points to therebel-compilerrepo root.
Method 1: Automated Script (Recommended)
⚠️ Warning: Do not use
build-with-external-rebel.shtogether withapply-custom-rebel.sh. Both scripts modifypyproject.tomland may cause environment conflicts. Use only one method at a time.
Use the build-with-external-rebel.sh script for automated build:
gcc-13 mode (default): Uses PyTorch from PyPI
cd /path/to/torch-rbln
export REBEL_HOME=/path/to/rebel_compiler
./tools/build-with-external-rebel.sh --clean
gcc-12 mode: Requires pre-built torch wheel
cd /path/to/torch-rbln
export REBEL_HOME=/path/to/rebel_compiler
export RBLN_GCC_VERSION=12
export TORCH_WHEEL_PATH=/path/to/torch-2.8.0-cp310-cp310-linux_x86_64.whl
./tools/build-with-external-rebel.sh --clean
Options:
--clean: Clean build artifacts before building--clean-only: Only clean build artifacts, do not build
Environment Variables:
REBEL_HOME: Path to rebel-compiler (REQUIRED)RBLN_GCC_VERSION: GCC version to use (12or13, default:13)TORCH_WHEEL_PATH: Path to pre-built torch wheel (REQUIRED for gcc-12, ignored for gcc-13)RBLN_BUILD_TYPE: Build type (ReleaseorDebug, default:Release)RBLN_VENV_PATH: Virtual environment path (default:.venv-rebel)
The script will:
- Check Python version compatibility with rebel-compiler
- Create virtual environment
- Install dependencies
- Configure pyproject.toml for external rebel-compiler
- Build and install torch-rbln
- Verify installation with import tests
After build:
source .venv-rebel/bin/activate
# activate_rebel is auto-sourced, setting REBEL_HOME, PYTHONPATH, LD_LIBRARY_PATH
python -c "import torch; import rebel; import torch_rbln; print('OK')"
Method 2: Manual Setup
1) Create and activate a virtualenv
python3 -m venv .venv
source .venv/bin/activate
2) Add your local rebel-compiler in editable mode
poetry add --editable "${REBEL_HOME}/python"
3) Install this project, using the external compiler
RBLN_USE_EXTERNAL_REBEL_COMPILER=1 poetry install --only-root
Create Git Commit
Git pre-commit hook is working. So, when you create Git commit, linting
would be triggered. For prepare linting, you MUST initialize lintrunner.
source ./.venv/bin/activate
lintrunner init
Once lintrunner initalized, no need to initialize again. You can commit now.
git commit
Some failures can be fixed automatically. Run below command for auto fixing.
lintrunner -m main -a
Run Tests
Assume that you are in Python virtual environment, and install torch-rbln
package successfully.
C++ Tests
Making package runs in new isolated environment. Although you build your C++
project using poetry install --only-root, can't find that directory.
So, for CTest you MUST build C++ project manually.
./tools/build-libtorch-rbln.sh
ctest --test-dir ./build
Python Tests
pytest ./test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torch_rbln-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: torch_rbln-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 704.5 kB
- Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
245e49a439903d8d5c5014323748169afd97dd5beef5ca925e466d9ae4d052bf
|
|
| MD5 |
414c4be5c952242e37b9f4e08e8d5384
|
|
| BLAKE2b-256 |
407a0ca9b01a921e6133d69960b67d13e391ef6daedac9fd15954f3c2179a6d7
|
File details
Details for the file torch_rbln-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: torch_rbln-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 703.2 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ea211d2d1b5f56297f77cbeb834381cb7c5012861eca3f37a6ea2f9bbb4c76a
|
|
| MD5 |
6fa6ac8d7fe6b343882a5dac1d447e23
|
|
| BLAKE2b-256 |
257691b698aca639695924a5bd07a49778e76519acf8d24b5c7ffb39a548d642
|
File details
Details for the file torch_rbln-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: torch_rbln-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 701.0 kB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63d1e8b553e09f271a9d2038639965ba7995a49db40f2afc92b2dd99baafda88
|
|
| MD5 |
dc2f6ed7f35ad931309857665795c7de
|
|
| BLAKE2b-256 |
deb87acbe7b6da35d0d49716580c93dc6cea990cc392df44b5a9d940df260df9
|
File details
Details for the file torch_rbln-0.1.7-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: torch_rbln-0.1.7-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 700.3 kB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4f0c8ac8214eff2bdf53a9a7d36565aa5691192d86891de677223522354bfa7
|
|
| MD5 |
653be58b3fb552adbd366308ee7dfb0b
|
|
| BLAKE2b-256 |
2970052905a00b93832612aeeb07bcd12ef5f17f00976c0f50ff509cbd5a24ca
|