Synthetic data generation engine for building task models
Project description
asynth
Generate targeted training data to replace expensive LLM API calls with fast, specialized models.
from asynth import synthesize, SynthesisConfig, LiteLLMInferenceConfig
from asynth.configs import GeneralSynthesisParams
from asynth.configs.params.synthesis_params import GeneratedAttribute, TextMessage
from asynth.types.conversation import Role
results = synthesize(SynthesisConfig(
num_samples=10,
inference_config=LiteLLMInferenceConfig(model="openai/gpt-4o-mini"),
strategy_params=GeneralSynthesisParams(
generated_attributes=[
GeneratedAttribute(
id="qa_pair",
instruction_messages=[
TextMessage(role=Role.SYSTEM, content="You are a trivia question writer."),
TextMessage(role=Role.USER, content="Write a trivia Q&A about science."),
],
),
],
),
))
[!NOTE] asynth is the data engine behind amortized — a platform for building and deploying task models that replace expensive LLM API calls with fast, cheap, specialized inference.
Why asynth?
Large models are expensive to run on every request. The alternative: generate synthetic training data, fine-tune a small purpose-built model, and amortize the cost over time.
- Build task models — small models that do one thing well, at a fraction of the cost
- Any LLM as teacher — use GPT-4o, Claude, Gemini, or any LiteLLM provider to generate data — just change the model string
- No heavy dependencies — no torch, no transformers, no CUDA. Installs in seconds
- Production pipeline — attribute sampling, quality checks, conversation planning, and tool-use simulation in a single
synthesize()call
Install
pip install asynth
pip install asynth[hf] # HuggingFace dataset loading
pip install asynth[docs] # Document ingestion (PDF, DOCX)
Requires Python >= 3.11.
Features
Data generation
- Attribute-based synthesis — combine sampled, generated, and transformed attributes in a single pipeline
- Multi-turn conversations — LLM-powered conversation planning with configurable turn counts and per-role personas
- Tool-use simulation — generate agentic conversations with tool calls grounded in environment definitions
Data sources
- Documents — PDF, DOCX, TXT, Markdown, HTML with token-based segmentation
- Datasets — JSONL, CSV, Parquet, TSV, XLSX, and HuggingFace datasets (
hf:org/dataset)
Quality
- Structural validation — role alternation, empty content, tool-call consistency checks before output
- LLM-as-a-Judge —
SimpleJudgeandRuleBasedJudgewith 15 pre-built evaluation configs (code quality, safety, truthfulness, etc.)
Infrastructure
- Provider-agnostic — OpenAI, Anthropic, Google, Azure, Together, Fireworks, Ollama, vLLM via LiteLLM
- Concurrent generation — async LLM calls with configurable concurrency limits
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asynth-0.1.3.tar.gz.
File metadata
- Download URL: asynth-0.1.3.tar.gz
- Upload date:
- Size: 316.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0385973c1615ec37dbfccc027b81f379bb4404e56f7ca2241386d25f85f37462
|
|
| MD5 |
900f5512b20d70bce93e182d49afd9ab
|
|
| BLAKE2b-256 |
393e3378c6c0c3c5ad8b1def069aecc1eb6d2c70bb6fa63cbd958f50b72f6d28
|
Provenance
The following attestation bundles were made for asynth-0.1.3.tar.gz:
Publisher:
publish.yml on amortized-ai/asynth
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asynth-0.1.3.tar.gz -
Subject digest:
0385973c1615ec37dbfccc027b81f379bb4404e56f7ca2241386d25f85f37462 - Sigstore transparency entry: 1771281294
- Sigstore integration time:
-
Permalink:
amortized-ai/asynth@f81342a3e17526d163f6ce3c47d17eaa359d035b -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/amortized-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f81342a3e17526d163f6ce3c47d17eaa359d035b -
Trigger Event:
release
-
Statement type:
File details
Details for the file asynth-0.1.3-py3-none-any.whl.
File metadata
- Download URL: asynth-0.1.3-py3-none-any.whl
- Upload date:
- Size: 119.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99b5c7cb552999a9e5b52c8bbf2bfa05747ef441e63be9593616308b3d9c39dd
|
|
| MD5 |
fad21c6951c4b5a4f1fb6d752deca726
|
|
| BLAKE2b-256 |
200d0b5c2975010a4fdd3a62e166a7d2e973d7b18d1c9df796f0ac4573248561
|
Provenance
The following attestation bundles were made for asynth-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on amortized-ai/asynth
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asynth-0.1.3-py3-none-any.whl -
Subject digest:
99b5c7cb552999a9e5b52c8bbf2bfa05747ef441e63be9593616308b3d9c39dd - Sigstore transparency entry: 1771281414
- Sigstore integration time:
-
Permalink:
amortized-ai/asynth@f81342a3e17526d163f6ce3c47d17eaa359d035b -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/amortized-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f81342a3e17526d163f6ce3c47d17eaa359d035b -
Trigger Event:
release
-
Statement type: