SFT data augmentation via conversation truncation
Project description
astra-augment
SFT data augmentation via conversation truncation for tool-calling trajectories.
Part of the Astra ecosystem — a data factory for high-quality multi-turn tool-calling conversation trajectories.
Install
pip install astra-augment
Usage
# Expand last 20% of tool calls into separate training samples
astra-augment expand input.jsonl -o output.jsonl --ratio 0.2 --mode tool_call
# Expand last 40% of assistant responses
astra-augment expand input.jsonl -o output.jsonl --ratio 0.4 --mode response
Modes
tool_call: Truncates conversations at assistant tool-call positions (skips those followed by failed responses)response: Truncates at assistant response positions (excludes the final one to avoid duplicating the original)
Parameters
| Parameter | Description |
|---|---|
--ratio |
Fraction of tail positions to expand (0, 1] |
--mode |
Truncation mode: tool_call or response |
--format |
Dataset format (default: qwen3) |
-o |
Output JSONL path |
Development
# Clone and set up
git clone https://github.com/zhangdw156/astra-augment.git
cd astra-augment
uv sync --all-groups
# Run tests
uv run pytest
# Lint
uv run ruff check .
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
astra_augment-0.6.0.tar.gz
(16.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file astra_augment-0.6.0.tar.gz.
File metadata
- Download URL: astra_augment-0.6.0.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81b5f2df5dce3e9c492f98e3bc21f54bb433e5fd14f7dedd2d56271cb8a95076
|
|
| MD5 |
6f5f6829a55566e235fbed83c131425f
|
|
| BLAKE2b-256 |
180242c1ce99f66347b518d85684fbe0015b33de18d36e49346759afeda72780
|
File details
Details for the file astra_augment-0.6.0-py3-none-any.whl.
File metadata
- Download URL: astra_augment-0.6.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54828100ddc4ce0503c037ab9bc0b47c1b0962365d203ed85495081bfb313f9d
|
|
| MD5 |
a77eb6ffc9d4ac4a224692650ed61cec
|
|
| BLAKE2b-256 |
910e0e2565cb39a9268c58de7902e7fac0740f5ca553ed45f2a7a5f0ea988060
|