SFT data augmentation via conversation truncation
Project description
astra-augment
SFT data augmentation via conversation truncation for tool-calling trajectories.
Part of the Astra ecosystem — a data factory for high-quality multi-turn tool-calling conversation trajectories.
Install
pip install astra-augment
Usage
# Expand last 20% of tool calls into separate training samples
astra-augment expand input.jsonl -o output.jsonl --ratio 0.2 --mode tool_call
# Expand last 40% of assistant responses
astra-augment expand input.jsonl -o output.jsonl --ratio 0.4 --mode response
Modes
tool_call: Truncates conversations at assistant tool-call positions (skips those followed by failed responses)response: Truncates at assistant response positions (excludes the final one to avoid duplicating the original)
Parameters
| Parameter | Description |
|---|---|
--ratio |
Fraction of tail positions to expand (0, 1] |
--mode |
Truncation mode: tool_call or response |
--format |
Dataset format (default: qwen3) |
-o |
Output JSONL path |
Development
# Clone and set up
git clone https://github.com/zhangdw156/astra-augment.git
cd astra-augment
uv sync --all-groups
# Run tests
uv run pytest
# Lint
uv run ruff check .
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
astra_augment-0.7.0.tar.gz
(21.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file astra_augment-0.7.0.tar.gz.
File metadata
- Download URL: astra_augment-0.7.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
815b6609abecf8b8e3917262f89a2112a156ebc25aadccf5b56889203899bdd2
|
|
| MD5 |
91c5faff102a7224a7f26128a1d5536a
|
|
| BLAKE2b-256 |
c88ec2b454681147bbae0e5fa0b4f63ca25da5c6f27fd899ca4fc44dd858c918
|
File details
Details for the file astra_augment-0.7.0-py3-none-any.whl.
File metadata
- Download URL: astra_augment-0.7.0-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af5cb95142554fc261c691fd9882689aa8c26cb2a8245639b2443faf5b6d0221
|
|
| MD5 |
d8c848c01f2bf3da9c1397f2fe749cc3
|
|
| BLAKE2b-256 |
1468387234a8a887d1087c82316caa4639082d10cb2d661e0a8c551cb12846fa
|