Skip to main content

TRL Jobs.

Project description

🏭 TRL Jobs

TRL Jobs is a simple wrapper around Hugging Face Jobs that makes it easy to run TRL (Transformer Reinforcement Learning) workflows directly on 🤗 Hugging Face infrastructure.

Think of it as the quickest way to kick off Supervised Fine-Tuning (SFT) and more, without worrying about all the boilerplate setup.

📦 Installation

Get started with a single command:

pip install trl-jobs

⚡ Quick Start

Run your first supervised fine-tuning job in just one line:

trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara

The training is tracked with Trackio and the fine-tuned model is automatically pushed to the 🤗 Hub.

trackio_sft trained_model_sft

🛠 Available Commands

Right now, SFT (Supervised Fine-Tuning) is supported. More workflows will be added soon!

🔹 SFT (Supervised Fine-Tuning)

trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara

Required arguments

  • --model_name → Model to fine-tune (e.g. Qwen/Qwen3-0.6B)
  • --dataset_name → Dataset to train on (e.g. trl-lib/Capybara)

Optional arguments

  • --peft → Use PEFT (LoRA) (default: False)
  • --flavor → Hardware flavor (default: a100-large, only option for now)
  • --timeout → Max runtime (1h by default). Supports s, m, h, d
  • -d, --detach → Run in background and print job ID
  • --namespace → Namespace where the job will run (default: your user namespace)
  • --token → Hugging Face token (only needed if not logged in)

➡️ You can also pass any arguments supported by trl sft. E.g.

trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara --learning_rate 3e-5

For the full list, see the TRL CLI docs.

Dataset format

SFT supports various 4 dataset formats.

  • Standard language modeling

    example = {"text": "The sky is blue."}
    
  • Standard prompt-completion

    example = {"prompt": "The sky is", "completion": " blue."}
    
  • Conversationanl language modeling

    example = {"messages": [
        {"role": "user", "content": "What color is the sky?"},
        {"role": "assistant", "content": "It is blue."}
    ]}
    
  • Conversational prompt-completion

    example = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
               "completion": [{"role": "assistant", "content": "It is blue."}]}
    

[!IMPORTANT] When using conversational dataset, ensure that the model has a chat template.

[!NOTE] When using prompt-completion dataset, the loss is only computed on the completion part.

For more details, see the TRL docs - Dataset formats.

📊 Supported Configurations

Here are some ready-to-go setups you can use out of the box.

🦙 Meta LLaMA 3

Model Max context length Tokens / batch Example command
Meta-Llama-3-8B 4096 262,144 trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B --dataset_name ...
Meta-Llama-3-8B-Instruct 4096 262,144 trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B-Instruct --dataset_name ...

🦙 Meta LLaMA 3 with PEFT

Model Max context length Tokens / batch Example command
Meta-Llama-3-8B 24,576 196,608 trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B --peft --dataset_name ...
Meta-Llama-3-8B-Instruct 24,576 196,608 trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B-Instruct --peft --dataset_name ...

🐧 Qwen3

Model Max context length Tokens / batch Example command
Qwen3-0.6B-Base 32,768 65,536 trl-jobs sft --model_name Qwen/Qwen3-0.6B-Base --dataset_name ...
Qwen3-0.6B 32,768 65,536 trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name ...
Qwen3-1.7B-Base 24,576 98,304 trl-jobs sft --model_name Qwen/Qwen3-1.7B-Base --dataset_name ...
Qwen3-1.7B 24,576 98,304 trl-jobs sft --model_name Qwen/Qwen3-1.7B --dataset_name ...
Qwen3-4B-Base 20,480 163,840 trl-jobs sft --model_name Qwen/Qwen3-4B-Base --dataset_name ...
Qwen3-4B 20,480 163,840 trl-jobs sft --model_name Qwen/Qwen3-4B --dataset_name ...
Qwen3-8B-Base 4,096 262,144 trl-jobs sft --model_name Qwen/Qwen3-8B-Base --dataset_name ...
Qwen3-8B 4,096 262,144 trl-jobs sft --model_name Qwen/Qwen3-8B --dataset_name ...

🐧 Qwen3 with PEFT

Model Max context length Tokens / batch Example command
Qwen3-8B-Base 24,576 196,608 trl-jobs sft --model_name Qwen/Qwen3-8B-Base --peft --dataset_name ...
Qwen3-8B 24,576 196,608 trl-jobs sft --model_name Qwen/Qwen3-8B --peft --dataset_name ...
Qwen3-14B-Base 20,480 163,840 trl-jobs sft --model_name Qwen/Qwen3-14B-Base --peft --dataset_name ...
Qwen3-14B 20,480 163,840 trl-jobs sft --model_name Qwen/Qwen3-14B --peft --dataset_name ...
Qwen3-32B 4,096 131,072 trl-jobs sft --model_name Qwen/Qwen3-32B --peft --dataset_name ...

SmolLM3

Model Max context length Tokens / batch Example command
HuggingFaceTB/SmolLM3-3B-Base 28,672 114,688 trl-jobs sft --model_name HuggingFaceTB/SmolLM3-3B --dataset_name ...
HuggingFaceTB/SmolLM3-3B 28,672 114,688 trl-jobs sft --model_name HuggingFaceTB/SmolLM3-3B --dataset_name ...

🤖 OpenAI GPT-OSS (with PEFT)

🚧 Coming soon!

💡 Want support for another model?

Open an issue or submit a PR—we’d love to hear from you!

🔑 Authentication

You’ll need a Hugging Face token to run jobs. You can provide it in any of these ways:

  1. Login with huggingface-cli login
  2. Set the environment variable HF_TOKEN
  3. Pass it directly with --token

📜 License

This project is under the MIT License. See the LICENSE file for details.

🤝 Contributing

We welcome contributions! Please open an issue or a PR on GitHub.

Before committing, run formatting checks:

ruff check . --fix && ruff format . --line-length 119

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trl_jobs-0.2.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trl_jobs-0.2.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file trl_jobs-0.2.0.tar.gz.

File metadata

  • Download URL: trl_jobs-0.2.0.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for trl_jobs-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e0b034eebf585acadfe57d92bf855ec2f708083f66069684b019655f176f70f6
MD5 c20cdc8b6de5a53b4cad7df88f0327b2
BLAKE2b-256 73006c1a6f21038badbe02faa63dce295e3a11e314e242536a44e1d83534b7c7

See more details on using hashes here.

File details

Details for the file trl_jobs-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: trl_jobs-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for trl_jobs-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a86e2a0784f697be0843fd5eece016f6eddde2bd7070459be0f58dc61e4cdc41
MD5 ca4645badaa7df19b13846df62452e09
BLAKE2b-256 1ea45cc616a90724cde3a073c8a13f26d98c11c14787c97da959cd577a932bef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page