Skip to main content

Synthetic data generation engine for building task models

Project description

asynth

Generate targeted training data to replace expensive LLM API calls with fast, specialized models.

Python 3.11+ License: Apache 2.0

from asynth import synthesize, SynthesisConfig, LiteLLMInferenceConfig
from asynth.configs import GeneralSynthesisParams
from asynth.configs.params.synthesis_params import GeneratedAttribute, TextMessage
from asynth.types.conversation import Role

results = synthesize(SynthesisConfig(
    num_samples=10,
    inference_config=LiteLLMInferenceConfig(model="openai/gpt-4o-mini"),
    strategy_params=GeneralSynthesisParams(
        generated_attributes=[
            GeneratedAttribute(
                id="qa_pair",
                instruction_messages=[
                    TextMessage(role=Role.SYSTEM, content="You are a trivia question writer."),
                    TextMessage(role=Role.USER, content="Write a trivia Q&A about science."),
                ],
            ),
        ],
    ),
))

[!NOTE] asynth is the data engine behind amortized — a platform for building and deploying task models that replace expensive LLM API calls with fast, cheap, specialized inference.

Why asynth?

Large models are expensive to run on every request. The alternative: generate synthetic training data, fine-tune a small purpose-built model, and amortize the cost over time.

  • Build task models — small models that do one thing well, at a fraction of the cost
  • Any LLM as teacher — use GPT-4o, Claude, Gemini, or any LiteLLM provider to generate data — just change the model string
  • No heavy dependencies — no torch, no transformers, no CUDA. Installs in seconds
  • Production pipeline — attribute sampling, quality checks, conversation planning, and tool-use simulation in a single synthesize() call

Install

pip install asynth
pip install asynth[hf]    # HuggingFace dataset loading
pip install asynth[docs]  # Document ingestion (PDF, DOCX)

Requires Python >= 3.11.

Features

Data generation

  • Attribute-based synthesis — combine sampled, generated, and transformed attributes in a single pipeline
  • Multi-turn conversations — LLM-powered conversation planning with configurable turn counts and per-role personas
  • Tool-use simulation — generate agentic conversations with tool calls grounded in environment definitions

Data sources

  • Documents — PDF, DOCX, TXT, Markdown, HTML with token-based segmentation
  • Datasets — JSONL, CSV, Parquet, TSV, XLSX, and HuggingFace datasets (hf:org/dataset)

Quality

  • Structural validation — role alternation, empty content, tool-call consistency checks before output
  • LLM-as-a-JudgeSimpleJudge and RuleBasedJudge with 15 pre-built evaluation configs (code quality, safety, truthfulness, etc.)

Infrastructure

  • Provider-agnostic — OpenAI, Anthropic, Google, Azure, Together, Fireworks, Ollama, vLLM via LiteLLM
  • Concurrent generation — async LLM calls with configurable concurrency limits

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asynth-0.1.4.tar.gz (306.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asynth-0.1.4-py3-none-any.whl (102.8 kB view details)

Uploaded Python 3

File details

Details for the file asynth-0.1.4.tar.gz.

File metadata

  • Download URL: asynth-0.1.4.tar.gz
  • Upload date:
  • Size: 306.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.4.tar.gz
Algorithm Hash digest
SHA256 00468de5cc7773276b93865ebb32ce5be652dd279334bd2088b60badcfd0728d
MD5 5784150c577831d3c6e1f8ca5fceb64a
BLAKE2b-256 89369d3259322cdea83e36c594934858ebbc9820606cfc195e3573e8f8cc208b

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.4.tar.gz:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asynth-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: asynth-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 102.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 76517fde6d36a27b49fcd34b41991b808eb41ec95a6aa2cd9a4531366090820c
MD5 7a52c1ef942f45c4ca749bc1b420ade0
BLAKE2b-256 e28c10eecff91481c0dc11b867b2e193cbe8987e4527bee31f7dd5e0cd4b8b3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.4-py3-none-any.whl:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page