Skip to main content

Using GRPO with RLVR, fine-tune LLMs to enhance coding capabilities

Project description

LFM-Coder: High-Performance RLVR for Small Language Models

Python 3.13+ License: MIT uv Ruff

Fine-tune LLMs to enhance coding capabilities using Reinforcement Learning from Verifiable Rewards (RLVR) with Group Relative Policy Optimization (GRPO). Includes a blazing-fast Python sandbox for safely running model-generated code.

Results

A model trained from this repository using only 1,000 examples from the OpenCoder dataset achieved a 49.1% improvement in coding performance on the MBPP benchmark while maintaining general capabilities:

benchmark results showing the change in model performance after fine-tuning

Try out the trained model, explore the metrics during training, or analyze the training artifacts.

Why LFM-Coder?

Small language models (SLMs) are the key to fast, local coding agents, but they often struggle with complex programming tasks. Liquid AI's LFM2.5-1.2B-Instruct is exceptionally fast and efficient, but not optimized for coding out of the box.

LFM-Coder bridges this gap using RLVR. By training lightweight LoRA adapters (~22M parameters) with Hugging Face TRL, we provide the model with a high-fidelity execution environment to learn from real-time, verifiable feedback. This approach significantly enhances coding performance while maintaining the model's tiny footprint and general capabilities.

Key Innovations and Optimizations

This repository goes beyond basic fine-tuning by implementing a production-grade RLVR environment and training pipeline:

🚀 High-Performance Sandbox

  • Dual-Engine Architecture: Seamlessly alternates between a blazing-fast Rust-based Python interpreter (Monty) and full-featured Docker/Podman containers.
  • Massive Concurrency: Threaded execution across all CPU cores for both engines, enabling high-throughput reward computation essential for GRPO.
  • Smart Dependency Management: Packages are installed dynamically based on code requirements. Local caching ensures subsequent runs load instantaneously and can run without network access.
  • Enterprise-Grade Isolation: Configurable resource guards (CPU/memory), execution timeouts, and network isolation to ensure secure execution of model-generated code.

⚡ Training and Evaluation Efficiency

  • Asynchronous Pipelining: Overlaps GPU completion generation with CPU-based code verification to maximize hardware utilization and minimize idle time.
  • Optimized RLVR Pipeline: Leverages QLoRA (4-bit) and Liger kernels to enable advanced GRPO training on consumer hardware (8GB VRAM).
  • Fault-Tolerant Workflows: Robust state management with automatic resumption for both training and evaluation cycles.

📊 Data Quality and Integrity

  • Benchmark Sanitization: Identifies and repairs incorrect test cases in standard benchmarks (HumanEvalPlus/MBPPPlus) to ensure rigorous evaluation.
  • Automated Validation: Verifies all training examples against provided solutions to guarantee data quality before RLVR begins.
  • Granular Metrics: Heuristic-driven extraction that calculates per-test-case pass rates and provides detailed logs for model weakness analysis.

Getting Started: Training

1. Requirements

  • Hardware: Single GPU with 8GB VRAM (e.g., RTX 4060).
  • Tooling: uv installed.

2. Setup

git clone https://github.com/rparkr/lfm-coder.git && cd lfm-coder
export HF_TOKEN="your-hf-token"

3. Configuration

Update training_config.toml with your model_id and output_dir.

4. Run Training

# Dry run to verify configuration
uv run lfm-coder --dry-run

# Start full training
uv run lfm-coder

Using the Python Sandbox

You can use the high-performance sandbox in your own projects for safe execution of LLM-generated code.

Installation

uv add lfm-coder  # or pip install lfm-coder

Basic Usage

The Sandbox class automatically routes code between Monty (fast) and Docker (full support).

from lfm_coder.sandbox import Sandbox

sandbox = Sandbox()

# Batch execution (parallel)
results = sandbox.run(["1+1", "import math; math.sqrt(16)", "print('Hello')"])
for r in results:
    print(f"Stdout: {r.stdout} | Result: {r.result}")

Advanced: Automatic Fallback

code = """
import httpx  # Requires Docker fallback
r = httpx.get('https://example.com')
print(r.status_code)
"""
result = sandbox.run(code)

[!NOTE] The Docker sandbox requires either Podman (recommended) or Docker to be installed and running.

Project Roadmap and Stats

🗺️ Status

  • Dual Sandboxes: MontySandbox + DockerSandbox with auto-routing.
  • Data Pipeline: Automated sampling, verification, and repair of benchmarks.
  • RLVR Training: GRPO integration with TRL and GPU optimizations.
  • Evaluation: Scoring module with GPU/CPU pipelining.
  • Ollama support: Fix chat template in fine-tuned GGUF model for multi-turn chat.

📊 Training Performance Metrics

Metric Monty Sandbox (Rust) Docker Sandbox (Container)
Execution Count 18,556 (77.3%) 5,444 (22.7%)
Avg. Speed 1.01 ms 2,577 ms
Median Speed 0.4 ms 2,240 ms
Success Rate 69.8% 35.8%
Throughput ~1,000 exec/sec ~0.4 exec/sec

Monty execution is 2,000x - 5,000x faster than the Docker fallback, providing the massive throughput required for efficient RLVR training.

Acknowledgments

License

Code: MIT license.
Model Weights: LFM license (Commercial restriction for >$10M revenue orgs).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lfm_coder-0.1.0.tar.gz (55.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lfm_coder-0.1.0-py3-none-any.whl (65.7 kB view details)

Uploaded Python 3

File details

Details for the file lfm_coder-0.1.0.tar.gz.

File metadata

  • Download URL: lfm_coder-0.1.0.tar.gz
  • Upload date:
  • Size: 55.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_coder-0.1.0.tar.gz
Algorithm Hash digest
SHA256 36bfe63f83e8f1fa5fc2f185bd2f4f211c4e4fd304ec0dfbb07f2887069faca4
MD5 c62b1db152b4ddbfe887e98662062315
BLAKE2b-256 1763058fa4ea3fc813582a82c5e3b011eaebae3e29ddb4a9c45b3176cce10850

See more details on using hashes here.

File details

Details for the file lfm_coder-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lfm_coder-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 65.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_coder-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84c1981faf2bb6372c1da1f3de6b989b228fe35d94a1c8b6c611f9bba60f1623
MD5 a034ab4a2fbd02265dcae1b56e7ae6a4
BLAKE2b-256 5c6db172758ca23f8ade1f6dfe7339d7d576ba3d28bda876e1f63ea9ae53f735

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page