Skip to main content

Synthetic data generation engine for building task models

Project description

asynth

Generate targeted training data to replace expensive LLM API calls with fast, specialized models.

Python 3.11+ License: Apache 2.0

from asynth import synthesize, SynthesisConfig, LiteLLMInferenceConfig
from asynth.configs import GeneralSynthesisParams
from asynth.configs.params.synthesis_params import GeneratedAttribute, TextMessage
from asynth.types.conversation import Role

results = synthesize(SynthesisConfig(
    num_samples=10,
    inference_config=LiteLLMInferenceConfig(model="openai/gpt-4o-mini"),
    strategy_params=GeneralSynthesisParams(
        generated_attributes=[
            GeneratedAttribute(
                id="qa_pair",
                instruction_messages=[
                    TextMessage(role=Role.SYSTEM, content="You are a trivia question writer."),
                    TextMessage(role=Role.USER, content="Write a trivia Q&A about science."),
                ],
            ),
        ],
    ),
))

[!NOTE] asynth is the data engine behind amortized — a platform for building and deploying task models that replace expensive LLM API calls with fast, cheap, specialized inference.

Why asynth?

Large models are expensive to run on every request. The alternative: generate synthetic training data, fine-tune a small purpose-built model, and amortize the cost over time.

  • Build task models — small models that do one thing well, at a fraction of the cost
  • Any LLM as teacher — use GPT-4o, Claude, Gemini, or any LiteLLM provider to generate data — just change the model string
  • No heavy dependencies — no torch, no transformers, no CUDA. Installs in seconds
  • Production pipeline — attribute sampling, quality checks, conversation planning, and tool-use simulation in a single synthesize() call

Install

pip install asynth
pip install asynth[hf]    # HuggingFace dataset loading
pip install asynth[docs]  # Document ingestion (PDF, DOCX)

Requires Python >= 3.11.

Features

Data generation

  • Attribute-based synthesis — combine sampled, generated, and transformed attributes in a single pipeline
  • Multi-turn conversations — LLM-powered conversation planning with configurable turn counts and per-role personas
  • Tool-use simulation — generate agentic conversations with tool calls grounded in environment definitions

Data sources

  • Documents — PDF, DOCX, TXT, Markdown, HTML with token-based segmentation
  • Datasets — JSONL, CSV, Parquet, TSV, XLSX, and HuggingFace datasets (hf:org/dataset)

Quality

  • Structural validation — role alternation, empty content, tool-call consistency checks before output
  • LLM-as-a-JudgeSimpleJudge and RuleBasedJudge with 15 pre-built evaluation configs (code quality, safety, truthfulness, etc.)

Infrastructure

  • Provider-agnostic — OpenAI, Anthropic, Google, Azure, Together, Fireworks, Ollama, vLLM via LiteLLM
  • Concurrent generation — async LLM calls with configurable concurrency limits

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asynth-0.1.0.tar.gz (310.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asynth-0.1.0-py3-none-any.whl (117.8 kB view details)

Uploaded Python 3

File details

Details for the file asynth-0.1.0.tar.gz.

File metadata

  • Download URL: asynth-0.1.0.tar.gz
  • Upload date:
  • Size: 310.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3b07848c4c0131fe4b6e318ccf188ea3a9b0cd82373f43274a6dcff58b7ed521
MD5 42f89ffe19d5b5361bf25737c3c9cdf5
BLAKE2b-256 61ea25296821a3a84936ad29c47532e5f24e85ec718f1335830f151585b96d52

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.0.tar.gz:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asynth-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: asynth-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 117.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 74262c14ab34f9aa77ad685963dbd628fc433e6e5ac72e00ea156080ae589b8e
MD5 77faef04f343d93ba9fc29e6f7ae3115
BLAKE2b-256 858f6a3ca3d0e00c08799636112e9bf659e653f2d9cb1d0b1c1e37256f6d59cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.0-py3-none-any.whl:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page