Advanced Kernels for Reinforcement Learning

These details have not been verified by PyPI

Project links

Homepage

Project description

Dark RL Logo

Dark RL: Experiments in Interactive Learning

Dark RL provides a high-level interface for interactive, online learning with large language models. The OnlineLLM interface offers a means of performing training and inference in one model efficiently. Empowering LLMs to learn in realtime from user feedback.

[!WARNING] Dark RL is in alpha

Key Features

🧠 Interactive and Online Learning: Continuously fine-tune your models with new data using LoRA, allowing them to acquire new skills without full retraining.
🔌 Adapter-Based Skills: Manage different LoRA adapters as distinct "skills" that can be loaded and used for specific tasks.
🚀 Unified Architecture: A single model instance handles both training and inference concurrently, using CUDA streams to manage GPU workloads efficiently.
🚀 Advanced CUDA Kernels: Specialized CUDA kernels for online learning
💡 Simple API: A clean and intuitive API that makes it easy to integrate online learning into your applications.

https://github.com/user-attachments/assets/c058728d-db87-4144-8771-7d7e69b3a81d

Interactive Learning

Interactive Learning is a human-in-the-loop training process where an AI model learns incrementally from real-time feedback. Instead of training on a static dataset, the model's understanding is refined through a continuous cycle of action, feedback, and correction.

In Dark RL, this is achieved by:

Observing the model's output for a given prompt.
Providing corrective examples via the .learn() method.
Updating a LoRA adapter with this new knowledge.

This approach allows you to "teach" the model new skills, correct its mistakes, and adapt its behavior to specific tasks, much like teaching a human. Because LoRA adapters are small and efficient, this learning process can happen in real-time, making it possible to shape the model's capabilities interactively.

Quick Start

Here's a minimal example of how to use OnlineLLM to generate text and teach the model a new skill.

from dark.online_llm import OnlineLLM

# 1. Initialize the OnlineLLM with a supported model
#    You may need to log in to Hugging Face first: `huggingface-cli login`
llm = OnlineLLM("Qwen/Qwen2.5-VL-7B-Instruct")

# 2. Generate text with the base model
prompt = "What is the capital of France?"
print(f"User: {prompt}")
response = llm.generate(prompt)
print(f"Assistant: {response}")
# Expected output: Paris

# 3. Teach the model a new, fictional skill (e.g., a new language)
#    Let's teach it that "zog" means "hello" in "Zoggian".
learning_examples = [
    {"prompt": "A greeting in Zoggian", "response": "zog"},
    {"prompt": "How to say 'hello' in Zoggian?", "response": "zog"},
]

# The `learn` method fine-tunes a LoRA adapter on your examples.
# We'll name this skill "zoggian-language".
llm.learn(learning_examples, adapter="zoggian-language")
print("\nLearning the Zoggian language...")

# 4. Use the newly acquired skill
#    Now, when we use the "zoggian-language" adapter, the model knows the new word.
prompt_with_skill = "Say 'hello' in Zoggian."
print(f"User: {prompt_with_skill}")
response_with_skill = llm.generate(prompt_with_skill, adapter="zoggian-language")
print(f"Assistant: {response_with_skill}")
# Expected output: zog

Installation

pip install dark-rl

[!NOTE] A minimum of 48gb VRAM is required

Unified Training and Inference

Dark RL uses a single model instance to handle both training and inference tasks simultaneously. This is made possible through the use of CUDA streams, which allow for the concurrent execution of different GPU operations.

Inference Stream: Generation tasks (i.e., generate, stream) are run on a dedicated inference stream. This ensures that they are executed with high priority and low latency.
Training Stream: LoRA fine-tuning tasks (learn) are run on a separate stream.

This architecture allows the server to remain responsive to inference requests even while the model is being fine-tuned in the background. An asyncio lock is used to ensure that the model's LoRA weights are swapped safely between tasks, preventing race conditions.

Deploying on RunPod with the Interactive UI

You can easily deploy a Dark RL server on a cloud GPU instance like RunPod. Here’s a basic guide for a machine with a 48GB VRAM card (e.g., an RTX A6000).

Choose a RunPod Template:
- Start a new Pod and select the "RunPod Pytorch 2.6" template. This provides a clean environment with Python, PyTorch, and CUDA pre-installed.
- Choose a GPU with at least 48GB of VRAM.
Connect to the Pod and Start the Server:
- Once the Pod is running, connect to it via SSH.
- First, install uv if it's not already available:
```
pip install uv
```
- Clone the repository and start the server. uv will handle creating a virtual environment, installing dependencies, and running the websocket_server.py.
```
git clone https://github.com/agentsea/dark.rl.git
cd dark.rl
uv run python websocket_server.py
```
Expose the Port:
- The websocket server runs on port 8000. In the RunPod dashboard for your Pod, expose this port to make the UI accessible over the internet.

Your Dark RL server is now running and ready for interactive learning.

Inspiration

Darknet
Nano-VLLM

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

Jul 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dark_rl-0.1.1.tar.gz (47.1 kB view details)

Uploaded Jul 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dark_rl-0.1.1-py3-none-any.whl (37.0 kB view details)

Uploaded Jul 17, 2025 Python 3

File details

Details for the file dark_rl-0.1.1.tar.gz.

File metadata

Download URL: dark_rl-0.1.1.tar.gz
Upload date: Jul 17, 2025
Size: 47.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.29

File hashes

Hashes for dark_rl-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c28e2438ecacee549e9b75ad04d25cc94923c97603312ae7d5538d4d17348fba`
MD5	`0e433ca29f27a1d06765d2c08a41782c`
BLAKE2b-256	`fe55c1cd59c997d7fd15ef85491541ddfd5cfa11e44e45a1ded942544bc0a72a`

See more details on using hashes here.

File details

Details for the file dark_rl-0.1.1-py3-none-any.whl.

File metadata

Download URL: dark_rl-0.1.1-py3-none-any.whl
Upload date: Jul 17, 2025
Size: 37.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.29

File hashes

Hashes for dark_rl-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4f68281d4c876e27580d17ca709767129c72ab08740e4155c6d52518de920ca`
MD5	`3542c27aed3153730e7738c133064d7e`
BLAKE2b-256	`7eb5862614c858a8c4088ac4545df8f174415930bc8af7dd20a2851613094bc5`

See more details on using hashes here.

dark-rl 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Dark RL: Experiments in Interactive Learning

Key Features

Interactive Learning

Quick Start

Installation

Unified Training and Inference

Deploying on RunPod with the Interactive UI

Inspiration

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes