Accelerate Hugging Face Transformers on Rockchip NPUs.

These details have not been verified by PyPI

Project description

RK-Transformers: Accelerate Hugging Face Transformers on Rockchip NPUs

RK-Transformers is a runtime library that seamlessly integrates Hugging Face transformers and sentence-transformers with Rockchip's RKNN Neural Processing Units (NPUs). It enables efficient and facile deployment of transformer models on edge devices powered by Rockchip SoCs (RK3588, RK3576, etc.).

✨ Key Features

🔄 Model Export & Conversion

Automatic ONNX Export: Converts Hugging Face models to ONNX with input detection
RKNN Optimization: Exports to RKNN format with configurable optimization levels (0-3)
Quantization: INT8 (w8a8) quantization with calibration dataset support
Push to Hub: Direct integration with Hugging Face Hub for model versioning

⚡ High-Performance Inference

NPU Acceleration: Leverage Rockchip's hardware NPU for 10-20x speedup
Multi-Core Support: Automatic core selection and load balancing across NPU cores
Memory Efficient: Optimized for edge devices with limited RAM

🧩 Framework Integration

Sentence Transformers: Drop-in replacement with RKSentenceTransformer and RKCrossEncoder
Transformers API: Compatible with standard Hugging Face pipelines

📦 Installation

Prerequisites

Python 3.10 - 3.12
Linux-based OS (Ubuntu 24.04+ recommended)
For export: PC with x86_64/arm64 architecture
For inference: Rockchip device with RKNPU2 support (RK3588, RK3576, etc.)

Quick Install

uv is recommended for faster installation and smaller environment footprint.

For Inference (on Rockchip devices [arm64])

uv venv
uv pip install rk-transformers[inference]

This installs runtime dependencies including:

rknn-toolkit-lite2 (2.3.2)
sentence-transformers (5.x)
numpy, torch, transformers

For Model Export (on development machines [x86_64, arm64])

uv venv
uv pip install rk-transformers[dev,export]
uv pip install torch==2.6.0+cpu --index-url https://download.pytorch.org/whl/cpu # workaround for rknn-toolkit2 dependency

This installs export dependencies including:

rknn-toolkit2 (2.3.2)
sentence-transformers (5.x)
numpy, torch, transformers, optimum[onnx], datasets

For Development (on development machines [x86_64, arm64])

# Clone the repository
git clone https://github.com/emapco/rk-transformers.git
cd rk-transformers

# Install with development tools
uv venv
uv pip install -e .[dev,export]
uv pip install torch==2.6.0+cpu --index-url https://download.pytorch.org/whl/cpu # workaround for rknn-toolkit2 dependency

🎯 Quick Start

1. Export a Model to RKNN

# Display help message with available options
rk-transformers-cli export -h 

# Export a Sentence Transformer model from Hugging Face Hub (float16)
rk-transformers-cli export \
  --model sentence-transformers/all-MiniLM-L6-v2 \
  --platform rk3588 \
  --flash-attention \
  --optimization-level 3

# Export with custom dataset for quantization (int8)
rk-transformers-cli export \
  --model sentence-transformers/all-MiniLM-L6-v2 \
  --platform rk3588 \
  --flash-attention \
  --quantize \
  --dtype w8a8 \
  --dataset sentence-transformers/natural-questions \
  --dataset-split train \
  --dataset-columns answer \
  --dataset-size 128 \
  --max-seq-length 128 # Default is 512

# Export a local ONNX model
rk-transformers-cli export \
  --model ./my-model/model.onnx \
  --platform rk3588 \
  --flash-attention \
  --batch-size 4 # Default is 1

2. Run Inference with Sentence Transformers

SentenceTransformer

from rktransformers import RKSentenceTransformer

model = RKSentenceTransformer(
    "rk-transformers/all-MiniLM-L6-v2",
    model_kwargs={
        "platform": "rk3588",
        "core_mask": "all",
    },
)

sentences = ["This is a test sentence", "Another example"]
embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 384)

# Load specific quantized model file
model = RKSentenceTransformer(
    "rk-transformers/all-MiniLM-L6-v2",
    model_kwargs={
        "platform": "rk3588",
        "file_name": "rknn/model_w8a8.rknn",
    },
)

CrossEncoder

from rktransformers import RKCrossEncoder

model = RKCrossEncoder(
    "rk-transformers/ms-marco-MiniLM-L12-v2",
    model_kwargs={"platform": "rk3588", "core_mask": "auto"},
)

pairs = [
    ["How old are you?", "What is your age?"],
    ["Hello world", "Hi there!"],
    ["What is RKNN?", "This is a test."],
]
scores = model.predict(pairs)
print(scores)

query = "Hi there!"
documents = [
    "What is going on?",
    "I am 25 years old.",
    "This is a test.",
    "RKNN is a neural network toolkit.",
]
results = model.rank(query, documents)
print(results)

# Load specific quantized model file
model = RKCrossEncoder(
    "rk-transformers/ms-marco-MiniLM-L12-v2",
    model_kwargs={
        "platform": "rk3588",
        "file_name": "rknn/model_w8a8.rknn",
    },
)

3. Use RK-Transformers API

View the docs for all supported models and their example usage.

from transformers import AutoTokenizer

from rktransformers import RKModelForFeatureExtraction

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("rk-transformers/all-MiniLM-L6-v2")
model = RKModelForFeatureExtraction.from_pretrained("rk-transformers/all-MiniLM-L6-v2", platform="rk3588", core_mask="auto")

# Tokenize and run inference
inputs = tokenizer(
    ["Sample text for embedding"],
    padding="max_length",
    truncation=True,
    return_tensors="np",
)

outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(axis=1)  # Mean pooling
print(embeddings.shape)  # (1, 384)

# Load specific quantized model file
model = RKModelForFeatureExtraction.from_pretrained(
    "rk-transformers/all-MiniLM-L6-v2", platform="rk3588", file_name="rknn/model_w8a8.rknn"
)

4. Use Transformers Pipelines

from transformers import pipeline

from rktransformers import RKModelForMaskedLM

# Load the RKNN model
model = RKModelForMaskedLM.from_pretrained(
    "rk-transformers/bert-base-uncased", platform="rk3588", file_name="rknn/model_w8a8.rknn"
)

# Create a fill-mask pipeline with the RKNN-accelerated model
fill_mask = pipeline(
    "fill-mask",
    model=model,
    tokenizer="rk-transformers/bert-base-uncased",
    framework="pt",  # required for RKNN
)

# Run inference
results = fill_mask("Paris is the [MASK] of France.")
print(results)

⚙️ NPU Core Configuration

Rockchip SoCs with multiple NPU cores (like RK3588 with 3 cores or RK3576 with 2 cores) support flexible core allocation strategies through the core_mask parameter. Choosing the right core mask can optimize performance based on your workload and system conditions. For more details, refer to the RK-Transformers docs.

Available Core Mask Options

Note: core_mask is specified at inference time.

Value	Description	Use Case
`"auto"`	Automatic mode - selects idle cores dynamically	Recommended: Best for most scenarios, `RKNN runtime` provides load balancing
`"0"`	NPU Core 0 only	Fixed core assignment
`"1"`	NPU Core 1 only	Fixed core assignment
`"2"`	NPU Core 2 only	Fixed core assignment (RK3588 only)
`"0_1"`	NPU Core 0 and 1 simultaneously	Parallel execution across 2 cores for larger models
`"0_1_2"`	NPU Core 0, 1, and 2 simultaneously	Maximum parallelism (RK3588 only) for demanding models
`"all"`	All available NPU cores	Equivalent to `"0_1_2"` on RK3588, `"0_1"` on RK3576

Usage Examples

RK-Transformers API

from rktransformers import RKModelForFeatureExtraction

# Auto-select idle cores (recommended for production)
model = RKModelForFeatureExtraction.from_pretrained("rk-transformers/all-MiniLM-L6-v2", platform="rk3588", core_mask="auto")

# Use specific core for dedicated workloads
model = RKModelForFeatureExtraction.from_pretrained(
    "rk-transformers/all-MiniLM-L6-v2",
    platform="rk3588",
    core_mask="1",  # Reserve core 0 for other tasks
)

# Use all cores for maximum performance
model = RKModelForFeatureExtraction.from_pretrained("rk-transformers/all-MiniLM-L6-v2", platform="rk3588", core_mask="all")

Sentence Transformers Integration

from rktransformers import RKSentenceTransformer, RKCrossEncoder

model = RKSentenceTransformer(
    "rk-transformers/all-MiniLM-L6-v2",
    model_kwargs={
        "platform": "rk3588",
        "core_mask": "auto",
    },
)

model = RKCrossEncoder(
    "rk-transformers/ms-marco-MiniLM-L12-v2",
    model_kwargs={
        "platform": "rk3588",
        "core_mask": "auto",
    },
)

Architecture

Runtime Loading Workflow

Model Discovery: RKModel.from_pretrained() searches for .rknn files
Config Matching: Reads the rknn config in config.json to match platform and constraints
Platform Validation: Checks compatibility with RKNNLite.list_support_target_platform()
Runtime Init: Loads model to NPU with specified core mask
Inference: Runs forward pass with automatic input/output handling

Cross-Component Communication

graph TB
    subgraph "Export Pipeline"
        HF[Hugging Face Model]
        OPT[Optimum ONNX Export]
        ONNX[ONNX Model]
        RKNN_TK[RKNN Toolkit]
        RKNN_FILE[.rknn File]
        
        HF -->|main_export| OPT
        OPT -->|ONNX graph| ONNX
        ONNX -->|load_onnx| RKNN_TK
        RKNN_TK -->|build/export| RKNN_FILE
    end
    
    subgraph "Inference Pipeline"
        RKNN_FILE -->|load| RKNN_LITE[RKNNLite Runtime]
        RKNN_LITE -->|init_runtime| NPU[RKNPU2 Hardware]
        NPU -->|inference| RESULTS[Model Outputs]
    end
    
    subgraph "Framework Integration"
        ST[Sentence Transformers]
        RKST[RKSentenceTransformer]
        RKCE[RKCrossEncoder]
        RKRT[RKModel Classes]
        HFT[Hugging Face Transformers]
        
        ST -->|subclasses| RKST
        ST -->|subclasses| RKCE
        RKST -->|load_rknn_model| RKRT
        RKCE -->|load_rknn_model| RKRT
        RKRT -->|inference| RKNN_LITE
        HFT -->|pipeline| RKRT
    end
    
    style NPU fill:#ff9900
    style RKNN_TK fill:#66ccff
    style RKNN_LITE fill:#66ccff

Configuration Files

`config.json`

The RKNN configuration is stored within the model's config.json file under the "rknn" key:

{
  "architectures": ["BertModel"],
  ...
  "rknn": {
    "model.rknn": {
      "platform": "rk3588",
      "batch_size": 1,
      "max_seq_length": 128,
      "model_input_names": ["input_ids", "attention_mask"],
      "quantized_dtype": "w8a8",
      "optimization_level": 3,
      ...
    },
    "rknn/optimized.rknn": {
      ...
    }
  }
}

The keys in the "rknn" object are relative paths to .rknn files, allowing multiple optimized variants per model.

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

📄 License

This project is licensed under the Apache License 2.0.

🙏 Acknowledgments

Hugging Face for the transformers, sentence-transformers and optimum libraries
Rockchip for RKNN toolkit and NPU hardware

🔗 Links

Repository: https://github.com/emapco/rk-transformers
Issues: https://github.com/emapco/rk-transformers/issues
Changelog: https://github.com/emapco/rk-transformers/releases
Rockchip RKNN Toolkit2 Docs: https://github.com/airockchip/rknn-toolkit2

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.1

Nov 30, 2025

0.3.0

Nov 29, 2025

0.2.0

Nov 27, 2025

0.1.0

Nov 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rk_transformers-0.3.1.tar.gz (6.6 MB view details)

Uploaded Nov 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rk_transformers-0.3.1-py3-none-any.whl (72.5 kB view details)

Uploaded Nov 30, 2025 Python 3

File details

Details for the file rk_transformers-0.3.1.tar.gz.

File metadata

Download URL: rk_transformers-0.3.1.tar.gz
Upload date: Nov 30, 2025
Size: 6.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rk_transformers-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`ae1ae2979a8d54be7516ac95f8a72855e1d56e61052e71d5e488822570bd87cf`
MD5	`15f3e06735ada087e5e591ce188d1275`
BLAKE2b-256	`d9cae7d9210e75b481ae3c6ff8c35592613d6cc0bea8bbee60195919b65a1e6a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rk_transformers-0.3.1.tar.gz:

Publisher: release.yaml on emapco/rk-transformers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rk_transformers-0.3.1.tar.gz
- Subject digest: ae1ae2979a8d54be7516ac95f8a72855e1d56e61052e71d5e488822570bd87cf
- Sigstore transparency entry: 731998287
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: emapco/rk-transformers@c79919a3398cf206af1934516cad1c28906d607a
- Branch / Tag: refs/heads/releases/0.3.1
- Owner: https://github.com/emapco
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@c79919a3398cf206af1934516cad1c28906d607a
- Trigger Event: push

File details

Details for the file rk_transformers-0.3.1-py3-none-any.whl.

File metadata

Download URL: rk_transformers-0.3.1-py3-none-any.whl
Upload date: Nov 30, 2025
Size: 72.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rk_transformers-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dd46fd63bd0dfab2df21d53c24603c173ca45744b3a2edf8c03bf3562f39b427`
MD5	`f8cf14e417eebc85958599f98bbddd34`
BLAKE2b-256	`b84852b1630b39932b9595b741328f6d21923d677fc9dea2b084eddc21eaa68d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rk_transformers-0.3.1-py3-none-any.whl:

Publisher: release.yaml on emapco/rk-transformers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rk_transformers-0.3.1-py3-none-any.whl
- Subject digest: dd46fd63bd0dfab2df21d53c24603c173ca45744b3a2edf8c03bf3562f39b427
- Sigstore transparency entry: 731998288
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: emapco/rk-transformers@c79919a3398cf206af1934516cad1c28906d607a
- Branch / Tag: refs/heads/releases/0.3.1
- Owner: https://github.com/emapco
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@c79919a3398cf206af1934516cad1c28906d607a
- Trigger Event: push

rk-transformers 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

RK-Transformers: Accelerate Hugging Face Transformers on Rockchip NPUs

✨ Key Features

🔄 Model Export & Conversion

⚡ High-Performance Inference

🧩 Framework Integration

📦 Installation

Prerequisites

Quick Install

For Inference (on Rockchip devices [arm64])

For Model Export (on development machines [x86_64, arm64])

For Development (on development machines [x86_64, arm64])

🎯 Quick Start

1. Export a Model to RKNN

2. Run Inference with Sentence Transformers

SentenceTransformer

CrossEncoder

3. Use RK-Transformers API

4. Use Transformers Pipelines

⚙️ NPU Core Configuration

Available Core Mask Options

Usage Examples

RK-Transformers API

Sentence Transformers Integration

Architecture

Runtime Loading Workflow

Cross-Component Communication

Configuration Files

config.json

🤝 Contributing

📄 License

🙏 Acknowledgments

🔗 Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`config.json`