tuft · PyPI

A multi-tenant fine-tuning platform for LLMs with Tinker-compatible API

These details have not been verified by PyPI

Project description

TuFT (Tenant-unified FineTuning) is a multi-tenant platform that lets multiple users fine-tune LLMs on shared infrastructure through a unified API. Access it via the Tinker SDK or compatible clients.

Check out our roadmap to see what we're building next.

We're open source and welcome contributions! Join the community:

Quick Install
Quick Start Example
Installation
Use the Pre-built Docker Image
User Guide
Persistence
Observability (OpenTelemetry)
Architecture
Roadmap
Development

Quick Install

Note: This script supports unix platforms. For other platforms, see Installation.

Install TuFT with a single command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/agentscope-ai/tuft/main/scripts/install.sh)"

This installs TuFT with full backend support (GPU dependencies, persistence, flash-attn) and a bundled Python environment to ~/.tuft. After installation, restart your terminal and run:

tuft

Quick Start Example

This example demonstrates how to use TuFT for training and sampling with the Tinker SDK. Make sure the server is running on port 10610 before running the code. See the Run the server section below for instructions on starting the server.

1. Data Preparation

Prepare your training data in the format expected by TuFT:

import tinker
from tinker import types

# Connect to the running TuFT server
client = tinker.ServiceClient(base_url="http://localhost:10610", api_key="local-dev-key")

# Discover available base models
capabilities = client.get_server_capabilities()
base_model = capabilities.supported_models[0].model_name

print("Supported models:")
for model in capabilities.supported_models:
    print("-", model.model_name or "(unknown)")

# Prepare training data
# In practice, you would use a tokenizer:
# tokenizer = training.get_tokenizer()
# prompt_tokens = tokenizer.encode("Hello from TuFT")
# target_tokens = tokenizer.encode(" Generalizing beyond the prompt")

# For this example, we use fake token IDs
prompt_tokens = [101, 42, 37, 102]
target_tokens = [101, 99, 73, 102]

datum = types.Datum(
    model_input=types.ModelInput.from_ints(prompt_tokens),
    loss_fn_inputs={
        "target_tokens": types.TensorData(
            data=target_tokens, 
            dtype="int64", 
            shape=[len(target_tokens)]
        ),
        "weights": types.TensorData(data=[1.0, 1.0, 1.0, 1.0], dtype="float32", shape=[4])
    },
)

Example Output:

Supported models:
- Qwen/Qwen3-4B
- Qwen/Qwen3-8B

2. Training

Create a LoRA training client and perform forward/backward passes with optimizer steps:

# Create a LoRA training client
training = client.create_lora_training_client(base_model=base_model, rank=8)

# Run forward/backward pass
fwdbwd = training.forward_backward([datum], "cross_entropy").result(timeout=30)
print("Loss metrics:", fwdbwd.metrics)

# Apply optimizer update
optim = training.optim_step(types.AdamParams(learning_rate=1e-4)).result(timeout=30)
print("Optimizer metrics:", optim.metrics)

Example Output:

Loss metrics: {'loss:sum': 2.345, 'step:max': 0.0, 'grad_norm:mean': 0.123}
Optimizer metrics: {'learning_rate:mean': 0.0001, 'step:max': 1.0, 'update_norm:mean': 0.045}

3. Save Checkpoint

Save the trained model checkpoint and sampler weights:

# Save checkpoint for training resumption
checkpoint = training.save_state("demo-checkpoint").result(timeout=60)
print("Checkpoint saved to:", checkpoint.path)

# Save sampler weights for inference
sampler_weights = training.save_weights_for_sampler("demo-sampler").result(timeout=60)
print("Sampler weights saved to:", sampler_weights.path)

# Inspect session information
rest = client.create_rest_client()
session_id = client.holder.get_session_id()
session_info = rest.get_session(session_id).result(timeout=30)
print("Session contains training runs:", session_info.training_run_ids)

Example Output:

Checkpoint saved to: tinker://550e8400-e29b-41d4-a716-446655440000/weights/checkpoint-001
Sampler weights saved to: tinker://550e8400-e29b-41d4-a716-446655440000/sampler_weights/sampler-001
Session contains training runs: ['550e8400-e29b-41d4-a716-446655440000']

4. Sampling

Load the saved weights and generate tokens:

# Create a sampling client with saved weights
sampling = client.create_sampling_client(model_path=sampler_weights.path)

# Prepare prompt for sampling
# sample_prompt = tokenizer.encode("Tell me something inspiring.")
sample_prompt = [101, 57, 12, 7, 102]

# Generate tokens
sample = sampling.sample(
    prompt=types.ModelInput.from_ints(sample_prompt),
    num_samples=1,
    sampling_params=types.SamplingParams(max_tokens=5, temperature=0.5),
).result(timeout=30)

if sample.sequences:
    print("Sample tokens:", sample.sequences[0].tokens)
    # Decode tokens to text:
    # sample_text = tokenizer.decode(sample.sequences[0].tokens)
    # print("Generated text:", sample_text)

Example Output:

Sample tokens: [101, 57, 12, 7, 42, 102]

Note: Replace fake token IDs with actual tokenizer calls when you have a tokenizer available locally.

Installation

Tip: For a quick one-command setup, see Quick Install. This section is for users who prefer to manage their own Python environment or need more control over the installation.

We recommend using uv for dependency management.

Install from Source Code

Clone the repository:

git clone https://github.com/agentscope-ai/TuFT

Create a virtual environment:
```
cd TuFT
uv venv --python 3.12
```
Activate environment:
```
source .venv/bin/activate
```

Install dependencies:

# Install minimal dependencies for non-development installs
uv sync

# If you need to develop or run tests, install dev dependencies
uv sync --extra dev

# If you want to run the full feature set (e.g., model serving, persistence),
# please install all dependencies
uv sync --all-extras
python scripts/install_flash_attn.py
# If you face issues with flash-attn installation, you can try installing it manually:
# uv pip install flash-attn --no-build-isolation

Install via PyPI

You can also install TuFT directly from PyPI:

uv pip install tuft

# Install optional dependencies as needed
uv pip install "tuft[dev,backend,persistence]"

Run the server

The CLI starts a FastAPI server:

tuft launch --port 10610 --config /path/to/tuft_config.yaml

The config file tuft_config.yaml specifies server settings including available base models, authentication, persistence, and telemetry. Below is a minimal example.

supported_models:
  - model_name: Qwen/Qwen3-4B
    model_path: Qwen/Qwen3-4B
    max_model_len: 32768
    tensor_parallel_size: 1
  - model_name: Qwen/Qwen3-8B
    model_path: Qwen/Qwen3-8B
    max_model_len: 32768
    tensor_parallel_size: 1

See config/tuft_config.example.yaml for a complete example configuration with all available options.

Use the Pre-built Docker Image

If you face issues with local installation or want to get started quickly, you can use the pre-built Docker image.

Pull the latest image from GitHub Container Registry:
```
docker pull ghcr.io/agentscope-ai/tuft:latest
```

Run the Docker container and start the TuFT server on port 10610:

docker run -it \
    --gpus all \
    --shm-size="128g" \
    --rm \
    -p 10610:10610 \
    -v <host_dir>:/data \
    ghcr.io/agentscope-ai/tuft:latest \
    tuft launch --port 10610 --config /data/tuft_config.yaml

Please replace <host_dir> with a directory on your host machine where you want to store model checkpoints and other data. Suppose you have the following structure on your host machine:

<host_dir>/
    ├── checkpoints/
    ├── Qwen3-4B/
    ├── Qwen3-8B/
    └── tuft_config.yaml

The tuft_config.yaml file defines the server configuration, for example:

supported_models:
  - model_name: Qwen/Qwen3-4B
    model_path: /data/Qwen3-4B
    max_model_len: 32768
    tensor_parallel_size: 1
  - model_name: Qwen/Qwen3-8B
    model_path: /data/Qwen3-8B
    max_model_len: 32768
    tensor_parallel_size: 1

User Guide

We provide practical examples to demonstrate how to use TuFT for training and sampling. The guides below cover both Supervised Fine-Tuning and Reinforcement Learning workflows, with links to runnable notebooks.

Dataset	Task	Guide	Example
no_robots	Supervised Fine-Tuning (SFT)	chat_sft.md	chat_sft.ipynb
Countdown	Reinforcement Learning (RL)	countdown_rl.md	countdown_rl.ipynb

Persistence

TuFT supports optional persistence for server state. When enabled, the server can recover sessions, training runs, sampling sessions, and futures after a restart (and then restore runtime model state from checkpoints).

See docs/persistence.md for full details (key layout, restore semantics, and safety checks).

uv pip install "tuft[persistence]"

# tuft_config.yaml
persistence:
  mode: REDIS
  redis_url: "redis://localhost:6379/0"
  namespace: "persistence-tuft-server"

Observability (OpenTelemetry)

TuFT supports optional OpenTelemetry integration for tracing, metrics, and logs. See docs/telemetry.md for details (what TuFT records, correlation keys, Ray context propagation, and collector setup).

# tuft_config.yaml
telemetry:
  enabled: true
  service_name: tuft
  otlp_endpoint: http://localhost:4317  # Your OTLP collector endpoint
  resource_attributes: {}

Alternatively, use environment variables:

export TUFT_OTLP_ENDPOINT=http://localhost:4317
export TUFT_OTEL_DEBUG=1  # Enable console exporter for debugging

Architecture

TuFT provides a unified service API for agentic model training and sampling. The system supports multiple LoRA adapters per base model and checkpoint management.

graph TB
    subgraph Client["Client Layer"]
        SDK[Tinker SDK Client]
    end
    
    subgraph API["TuFT Service API"]
        REST[Service API<br/>REST/HTTP]
        Session[Session Management]
    end
    
    subgraph Backend["Backend Layer"]
        Training[Training Backend<br/>Forward/Backward/Optim Step]
        Sampling[Sampling Backend<br/>Token Generation]
    end
    
    subgraph Models["Model Layer"]
        BaseModel[Base LLM Model]
        LoRA[LoRA Adapters<br/>Multiple per Base Model]
    end
    
    subgraph Storage["Storage"]
        Checkpoint[Model Checkpoints<br/>& LoRA Weights]
    end
    
    SDK --> REST
    REST --> Session
    Session --> Training
    Session --> Sampling
    Training --> BaseModel
    Training --> LoRA
    Sampling --> BaseModel
    Sampling --> LoRA
    Training --> Checkpoint
    Sampling --> Checkpoint

Key Components

Service API: RESTful interface for training and sampling operations
Training Backend: Handles forward/backward passes and optimizer steps for LoRA fine-tuning
Sampling Backend: Generates tokens from trained models
Checkpoint Storage: Manages model checkpoints and LoRA weights

Roadmap

Core Focus: Post-Training for Agent Scenarios

We focus on post-training for agentic models. The rollout phase in RL training involves reasoning, multi-turn conversations, and tool use, which tends to be asynchronous relative to the training phase. We aim to improve the throughput and resource efficiency of the overall system, building tools that are easy to use and integrate into existing workflows.

Architecture & Positioning

Horizontal platform: Not a vertically integrated fine-tuning solution, but a flexible platform that plugs into different training frameworks and compute infrastructures
Code-first API: Connects agentic training workflows with compute infrastructure through programmatic interfaces
Layer in AI stack: Sits above the infrastructure layer (Kubernetes, cloud platforms, GPU clusters), integrating with training frameworks (PeFT, FSDP, vLLM, DeepSpeed) as implementation dependencies
Integration approach: Works with existing ecosystems rather than replacing them

Near-Term (3 months)

Multi-machine, multi-GPU training: Support distributed architectures using PeFT, FSDP, vLLM, DeepSpeed, etc.
Cloud-native deployment: Integration with AWS, Alibaba Cloud, GCP, Azure and Kubernetes orchestration
Observability: Monitoring system with real-time logs, GPU metrics, training progress, and debugging tools
Serverless GPU: Lightweight runtime for diverse deployment scenarios, with multi-user and multi-tenant GPU resource sharing to improve utilization efficiency

Long-Term (6 months)

Environment-driven learning loop: Standardized interfaces with WebShop, MiniWob++, BrowserEnv, Voyager and other agent training environments
Automated pipeline: Task execution → feedback collection → data generation → model updates
Advanced RL paradigms: RLAIF, Error Replay, and environment feedback mechanisms
Simulation sandboxes: Lightweight local environments for rapid experimentation

Open Collaboration: We are Looking for Collaborators

This roadmap is not fixed, but rather a starting point for our journey with the open source community. Every feature design will be implemented through GitHub Issue discussions, PRs, and prototype validation. We sincerely welcome you to propose real-world use cases, performance bottlenecks, or innovative ideas—it is these voices that will collectively define the future of Agent post-training.

We welcome suggestions and contributions from the community! Join us on:

DingTalk Group
Discord (on AgentScope's Server)

Development

Setup Development Environment

Install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Install dev dependencies:
```
uv sync --extra dev
```
Set up pre-commit hooks:
```
uv run pre-commit install
```

Running Tests

uv run pytest

To skip integration tests:

uv run pytest -m "not integration"

For detailed testing instructions, including GPU tests, persistence testing, and writing new tests, see the Testing Guide.

Linting and Type Checking

Run the linter:

uv run ruff check .
uv run ruff format .

Run the type checker:

uv run pyright

Notebook Linting

For Jupyter notebooks:

uv run nbqa ruff notebooks/

Secret Detection

Scan and update the secrets baseline:

uv run detect-secrets scan > .secrets.baseline

Audit detected secrets to mark false positives:

uv run detect-secrets audit .secrets.baseline

Contributing

Please ensure all tests pass and pre-commit hooks succeed before creating new PRs.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Apr 13, 2026

0.1.4

Mar 23, 2026

This version

0.1.3

Feb 3, 2026

0.1.2

Jan 29, 2026

0.1.1

Jan 29, 2026

0.1.0

Jan 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tuft-0.1.3.tar.gz (244.1 kB view details)

Uploaded Feb 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tuft-0.1.3-py3-none-any.whl (75.5 kB view details)

Uploaded Feb 3, 2026 Python 3

File details

Details for the file tuft-0.1.3.tar.gz.

File metadata

Download URL: tuft-0.1.3.tar.gz
Upload date: Feb 3, 2026
Size: 244.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tuft-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`8ee5bbc66925c93a45cb934f5b2766c68aab0b18bcd2afc58d29482eefa15578`
MD5	`3b048f6af0f49e29b5408fad566d04d1`
BLAKE2b-256	`51079c9d19af44625923c7fa89b31427678964cbeb9ed1abe1d275a9ed933bc2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tuft-0.1.3.tar.gz:

Publisher: publish.yml on agentscope-ai/TuFT

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tuft-0.1.3.tar.gz
- Subject digest: 8ee5bbc66925c93a45cb934f5b2766c68aab0b18bcd2afc58d29482eefa15578
- Sigstore transparency entry: 907925452
- Sigstore integration time: Feb 3, 2026
Source repository:
- Permalink: agentscope-ai/TuFT@7e3257021784979f36077689ecba71bb3b974d5b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/agentscope-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7e3257021784979f36077689ecba71bb3b974d5b
- Trigger Event: workflow_dispatch

File details

Details for the file tuft-0.1.3-py3-none-any.whl.

File metadata

Download URL: tuft-0.1.3-py3-none-any.whl
Upload date: Feb 3, 2026
Size: 75.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tuft-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ea825da80a9fdf513aac19973be4abc87440dd960d3273e2174c42309d7702f`
MD5	`1379310b962f940dc7a38d8392426554`
BLAKE2b-256	`746d62729f4530318520681dd016d9fa399d17e0db04ee7f935f161c86192de3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tuft-0.1.3-py3-none-any.whl:

Publisher: publish.yml on agentscope-ai/TuFT

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tuft-0.1.3-py3-none-any.whl
- Subject digest: 7ea825da80a9fdf513aac19973be4abc87440dd960d3273e2174c42309d7702f
- Sigstore transparency entry: 907925455
- Sigstore integration time: Feb 3, 2026
Source repository:
- Permalink: agentscope-ai/TuFT@7e3257021784979f36077689ecba71bb3b974d5b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/agentscope-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7e3257021784979f36077689ecba71bb3b974d5b
- Trigger Event: workflow_dispatch

tuft 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Table of Contents

Quick Install

Quick Start Example

1. Data Preparation

2. Training

3. Save Checkpoint

4. Sampling

Installation

Install from Source Code

Install via PyPI

Run the server

Use the Pre-built Docker Image

User Guide

Persistence

Observability (OpenTelemetry)

Architecture

Key Components

Roadmap

Core Focus: Post-Training for Agent Scenarios

Architecture & Positioning

Near-Term (3 months)

Long-Term (6 months)

Open Collaboration: We are Looking for Collaborators

Development

Setup Development Environment

Running Tests

Linting and Type Checking

Notebook Linting

Secret Detection

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance