Skip to main content

A multi-tenant fine-tuning platform for LLMs with Tinker-compatible API

Project description

TuFT Logo

TuFT (Tenant-unified FineTuning) is a multi-tenant platform that lets multiple users fine-tune LLMs on shared infrastructure through a unified API. Access it via the Tinker SDK or compatible clients.

Check out our roadmap to see what we're building next.

We're open source and welcome contributions! Join the community:

Table of Contents

Quick Install

Note: This script supports unix platforms. For other platforms, see Installation.

Install TuFT with a single command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/agentscope-ai/tuft/main/scripts/install.sh)"

This installs TuFT with full backend support (GPU dependencies, persistence, flash-attn) and a bundled Python environment to ~/.tuft. After installation, restart your terminal and run:

tuft

Quick Start Example

This example demonstrates how to use TuFT for training and sampling with the Tinker SDK. Make sure the server is running on port 10610 before running the code. See the Run the server section below for instructions on starting the server.

1. Data Preparation

Prepare your training data in the format expected by TuFT:

import tinker
from tinker import types

# Connect to the running TuFT server
client = tinker.ServiceClient(base_url="http://localhost:10610", api_key="local-dev-key")

# Discover available base models
capabilities = client.get_server_capabilities()
base_model = capabilities.supported_models[0].model_name

print("Supported models:")
for model in capabilities.supported_models:
    print("-", model.model_name or "(unknown)")

# Prepare training data
# In practice, you would use a tokenizer:
# tokenizer = training.get_tokenizer()
# prompt_tokens = tokenizer.encode("Hello from TuFT")
# target_tokens = tokenizer.encode(" Generalizing beyond the prompt")

# For this example, we use fake token IDs
prompt_tokens = [101, 42, 37, 102]
target_tokens = [101, 99, 73, 102]

datum = types.Datum(
    model_input=types.ModelInput.from_ints(prompt_tokens),
    loss_fn_inputs={
        "target_tokens": types.TensorData(
            data=target_tokens, 
            dtype="int64", 
            shape=[len(target_tokens)]
        ),
        "weights": types.TensorData(data=[1.0, 1.0, 1.0, 1.0], dtype="float32", shape=[4])
    },
)

Example Output:

Supported models:
- Qwen/Qwen3-4B
- Qwen/Qwen3-8B

2. Training

Create a LoRA training client and perform forward/backward passes with optimizer steps:

# Create a LoRA training client
training = client.create_lora_training_client(base_model=base_model, rank=8)

# Run forward/backward pass
fwdbwd = training.forward_backward([datum], "cross_entropy").result(timeout=30)
print("Loss metrics:", fwdbwd.metrics)

# Apply optimizer update
optim = training.optim_step(types.AdamParams(learning_rate=1e-4)).result(timeout=30)
print("Optimizer metrics:", optim.metrics)

Example Output:

Loss metrics: {'loss:sum': 2.345, 'step:max': 0.0, 'grad_norm:mean': 0.123}
Optimizer metrics: {'learning_rate:mean': 0.0001, 'step:max': 1.0, 'update_norm:mean': 0.045}

3. Save Checkpoint

Save the trained model checkpoint and sampler weights:

# Save checkpoint for training resumption
checkpoint = training.save_state("demo-checkpoint").result(timeout=60)
print("Checkpoint saved to:", checkpoint.path)

# Save sampler weights for inference
sampler_weights = training.save_weights_for_sampler("demo-sampler").result(timeout=60)
print("Sampler weights saved to:", sampler_weights.path)

# Inspect session information
rest = client.create_rest_client()
session_id = client.holder.get_session_id()
session_info = rest.get_session(session_id).result(timeout=30)
print("Session contains training runs:", session_info.training_run_ids)

Example Output:

Checkpoint saved to: tinker://550e8400-e29b-41d4-a716-446655440000/weights/checkpoint-001
Sampler weights saved to: tinker://550e8400-e29b-41d4-a716-446655440000/sampler_weights/sampler-001
Session contains training runs: ['550e8400-e29b-41d4-a716-446655440000']

4. Sampling

Load the saved weights and generate tokens:

# Create a sampling client with saved weights
sampling = client.create_sampling_client(model_path=sampler_weights.path)

# Prepare prompt for sampling
# sample_prompt = tokenizer.encode("Tell me something inspiring.")
sample_prompt = [101, 57, 12, 7, 102]

# Generate tokens
sample = sampling.sample(
    prompt=types.ModelInput.from_ints(sample_prompt),
    num_samples=1,
    sampling_params=types.SamplingParams(max_tokens=5, temperature=0.5),
).result(timeout=30)

if sample.sequences:
    print("Sample tokens:", sample.sequences[0].tokens)
    # Decode tokens to text:
    # sample_text = tokenizer.decode(sample.sequences[0].tokens)
    # print("Generated text:", sample_text)

Example Output:

Sample tokens: [101, 57, 12, 7, 42, 102]

Note: Replace fake token IDs with actual tokenizer calls when you have a tokenizer available locally.

Installation

Tip: For a quick one-command setup, see Quick Install. This section is for users who prefer to manage their own Python environment or need more control over the installation.

We recommend using uv for dependency management.

Install from Source Code

  1. Clone the repository:

    git clone https://github.com/agentscope-ai/TuFT
    
  2. Create a virtual environment:

    cd TuFT
    uv venv --python 3.12
    
  3. Activate environment:

    source .venv/bin/activate
    
  4. Install dependencies:

    # Install minimal dependencies for non-development installs
    uv sync
    
    # If you need to develop or run tests, install dev dependencies
    uv sync --extra dev
    
    # If you want to run the full feature set (e.g., model serving, persistence),
    # please install all dependencies
    uv sync --all-extras
    python scripts/install_flash_attn.py
    # If you face issues with flash-attn installation, you can try installing it manually:
    # uv pip install flash-attn --no-build-isolation
    

Install via PyPI

You can also install TuFT directly from PyPI:

uv pip install tuft

# Install optional dependencies as needed
uv pip install "tuft[dev,backend,persistence]"

Run the server

The CLI starts a FastAPI server:

tuft launch --port 10610 --config /path/to/tuft_config.yaml

The config file tuft_config.yaml specifies server settings including available base models, authentication, persistence, and telemetry. Below is a minimal example.

supported_models:
  - model_name: Qwen/Qwen3-4B
    model_path: Qwen/Qwen3-4B
    max_model_len: 32768
    tensor_parallel_size: 1
  - model_name: Qwen/Qwen3-8B
    model_path: Qwen/Qwen3-8B
    max_model_len: 32768
    tensor_parallel_size: 1

See config/tuft_config.example.yaml for a complete example configuration with all available options.

Use the Pre-built Docker Image

If you face issues with local installation or want to get started quickly, you can use the pre-built Docker image.

  1. Pull the latest image from GitHub Container Registry:

    docker pull ghcr.io/agentscope-ai/tuft:latest
    
  2. Run the Docker container and start the TuFT server on port 10610:

    docker run -it \
        --gpus all \
        --shm-size="128g" \
        --rm \
        -p 10610:10610 \
        -v <host_dir>:/data \
        ghcr.io/agentscope-ai/tuft:latest \
        tuft launch --port 10610 --config /data/tuft_config.yaml
    

    Please replace <host_dir> with a directory on your host machine where you want to store model checkpoints and other data. Suppose you have the following structure on your host machine:

    <host_dir>/
        ├── checkpoints/
        ├── Qwen3-4B/
        ├── Qwen3-8B/
        └── tuft_config.yaml
    

    The tuft_config.yaml file defines the server configuration, for example:

    supported_models:
      - model_name: Qwen/Qwen3-4B
        model_path: /data/Qwen3-4B
        max_model_len: 32768
        tensor_parallel_size: 1
      - model_name: Qwen/Qwen3-8B
        model_path: /data/Qwen3-8B
        max_model_len: 32768
        tensor_parallel_size: 1
    

User Guide

We provide practical examples to demonstrate how to use TuFT for training and sampling. The guides below cover both Supervised Fine-Tuning and Reinforcement Learning workflows, with links to runnable notebooks.

Dataset Task Guide Example
no_robots Supervised Fine-Tuning (SFT) chat_sft.md chat_sft.ipynb
Countdown Reinforcement Learning (RL) countdown_rl.md countdown_rl.ipynb

Persistence

TuFT supports optional persistence for server state. When enabled, the server can recover sessions, training runs, sampling sessions, and futures after a restart (and then restore runtime model state from checkpoints).

See docs/persistence.md for full details (key layout, restore semantics, and safety checks).

uv pip install "tuft[persistence]"
# tuft_config.yaml
persistence:
  mode: REDIS
  redis_url: "redis://localhost:6379/0"
  namespace: "persistence-tuft-server"

Observability (OpenTelemetry)

TuFT supports optional OpenTelemetry integration for tracing, metrics, and logs. See docs/telemetry.md for details (what TuFT records, correlation keys, Ray context propagation, and collector setup).

# tuft_config.yaml
telemetry:
  enabled: true
  service_name: tuft
  otlp_endpoint: http://localhost:4317  # Your OTLP collector endpoint
  resource_attributes: {}

Alternatively, use environment variables:

export TUFT_OTLP_ENDPOINT=http://localhost:4317
export TUFT_OTEL_DEBUG=1  # Enable console exporter for debugging

Architecture

TuFT provides a unified service API for agentic model training and sampling. The system supports multiple LoRA adapters per base model and checkpoint management.

graph TB
    subgraph Client["Client Layer"]
        SDK[Tinker SDK Client]
    end
    
    subgraph API["TuFT Service API"]
        REST[Service API<br/>REST/HTTP]
        Session[Session Management]
    end
    
    subgraph Backend["Backend Layer"]
        Training[Training Backend<br/>Forward/Backward/Optim Step]
        Sampling[Sampling Backend<br/>Token Generation]
    end
    
    subgraph Models["Model Layer"]
        BaseModel[Base LLM Model]
        LoRA[LoRA Adapters<br/>Multiple per Base Model]
    end
    
    subgraph Storage["Storage"]
        Checkpoint[Model Checkpoints<br/>& LoRA Weights]
    end
    
    SDK --> REST
    REST --> Session
    Session --> Training
    Session --> Sampling
    Training --> BaseModel
    Training --> LoRA
    Sampling --> BaseModel
    Sampling --> LoRA
    Training --> Checkpoint
    Sampling --> Checkpoint

Key Components

  • Service API: RESTful interface for training and sampling operations
  • Training Backend: Handles forward/backward passes and optimizer steps for LoRA fine-tuning
  • Sampling Backend: Generates tokens from trained models
  • Checkpoint Storage: Manages model checkpoints and LoRA weights

Roadmap

Core Focus: Post-Training for Agent Scenarios

We focus on post-training for agentic models. The rollout phase in RL training involves reasoning, multi-turn conversations, and tool use, which tends to be asynchronous relative to the training phase. We aim to improve the throughput and resource efficiency of the overall system, building tools that are easy to use and integrate into existing workflows.

Architecture & Positioning

  • Horizontal platform: Not a vertically integrated fine-tuning solution, but a flexible platform that plugs into different training frameworks and compute infrastructures
  • Code-first API: Connects agentic training workflows with compute infrastructure through programmatic interfaces
  • Layer in AI stack: Sits above the infrastructure layer (Kubernetes, cloud platforms, GPU clusters), integrating with training frameworks (PeFT, FSDP, vLLM, DeepSpeed) as implementation dependencies
  • Integration approach: Works with existing ecosystems rather than replacing them

Near-Term (3 months)

  • Multi-machine, multi-GPU training: Support distributed architectures using PeFT, FSDP, vLLM, DeepSpeed, etc.
  • Cloud-native deployment: Integration with AWS, Alibaba Cloud, GCP, Azure and Kubernetes orchestration
  • Observability: Monitoring system with real-time logs, GPU metrics, training progress, and debugging tools
  • Serverless GPU: Lightweight runtime for diverse deployment scenarios, with multi-user and multi-tenant GPU resource sharing to improve utilization efficiency

Long-Term (6 months)

  • Environment-driven learning loop: Standardized interfaces with WebShop, MiniWob++, BrowserEnv, Voyager and other agent training environments
  • Automated pipeline: Task execution → feedback collection → data generation → model updates
  • Advanced RL paradigms: RLAIF, Error Replay, and environment feedback mechanisms
  • Simulation sandboxes: Lightweight local environments for rapid experimentation

Open Collaboration: We are Looking for Collaborators

This roadmap is not fixed, but rather a starting point for our journey with the open source community. Every feature design will be implemented through GitHub Issue discussions, PRs, and prototype validation. We sincerely welcome you to propose real-world use cases, performance bottlenecks, or innovative ideas—it is these voices that will collectively define the future of Agent post-training.

We welcome suggestions and contributions from the community! Join us on:

Development

Setup Development Environment

  1. Install uv if you haven't already:

    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  2. Install dev dependencies:

    uv sync --extra dev
    
  3. Set up pre-commit hooks:

    uv run pre-commit install
    

Running Tests

uv run pytest

To skip integration tests:

uv run pytest -m "not integration"

For detailed testing instructions, including GPU tests, persistence testing, and writing new tests, see the Testing Guide.

Linting and Type Checking

Run the linter:

uv run ruff check .
uv run ruff format .

Run the type checker:

uv run pyright

Notebook Linting

For Jupyter notebooks:

uv run nbqa ruff notebooks/

Secret Detection

Scan and update the secrets baseline:

uv run detect-secrets scan > .secrets.baseline

Audit detected secrets to mark false positives:

uv run detect-secrets audit .secrets.baseline

Contributing

Please ensure all tests pass and pre-commit hooks succeed before creating new PRs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tuft-0.1.3.tar.gz (244.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tuft-0.1.3-py3-none-any.whl (75.5 kB view details)

Uploaded Python 3

File details

Details for the file tuft-0.1.3.tar.gz.

File metadata

  • Download URL: tuft-0.1.3.tar.gz
  • Upload date:
  • Size: 244.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tuft-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8ee5bbc66925c93a45cb934f5b2766c68aab0b18bcd2afc58d29482eefa15578
MD5 3b048f6af0f49e29b5408fad566d04d1
BLAKE2b-256 51079c9d19af44625923c7fa89b31427678964cbeb9ed1abe1d275a9ed933bc2

See more details on using hashes here.

Provenance

The following attestation bundles were made for tuft-0.1.3.tar.gz:

Publisher: publish.yml on agentscope-ai/TuFT

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tuft-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: tuft-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 75.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tuft-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7ea825da80a9fdf513aac19973be4abc87440dd960d3273e2174c42309d7702f
MD5 1379310b962f940dc7a38d8392426554
BLAKE2b-256 746d62729f4530318520681dd016d9fa399d17e0db04ee7f935f161c86192de3

See more details on using hashes here.

Provenance

The following attestation bundles were made for tuft-0.1.3-py3-none-any.whl:

Publisher: publish.yml on agentscope-ai/TuFT

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page