Skip to main content

Synthetic data generation engine for building task models

Project description

asynth

Generate targeted training data to replace expensive LLM API calls with fast, specialized models.

Python 3.11+ License: Apache 2.0

from asynth import synthesize, SynthesisConfig, LiteLLMInferenceConfig
from asynth.configs import GeneralSynthesisParams
from asynth.configs.params.synthesis_params import GeneratedAttribute, TextMessage
from asynth.types.conversation import Role

results = synthesize(SynthesisConfig(
    num_samples=10,
    inference_config=LiteLLMInferenceConfig(model="openai/gpt-4o-mini"),
    strategy_params=GeneralSynthesisParams(
        generated_attributes=[
            GeneratedAttribute(
                id="qa_pair",
                instruction_messages=[
                    TextMessage(role=Role.SYSTEM, content="You are a trivia question writer."),
                    TextMessage(role=Role.USER, content="Write a trivia Q&A about science."),
                ],
            ),
        ],
    ),
))

[!NOTE] asynth is the data engine behind amortized — a platform for building and deploying task models that replace expensive LLM API calls with fast, cheap, specialized inference.

Why asynth?

Large models are expensive to run on every request. The alternative: generate synthetic training data, fine-tune a small purpose-built model, and amortize the cost over time.

  • Build task models — small models that do one thing well, at a fraction of the cost
  • Any LLM as teacher — use GPT-4o, Claude, Gemini, or any LiteLLM provider to generate data — just change the model string
  • No heavy dependencies — no torch, no transformers, no CUDA. Installs in seconds
  • Production pipeline — attribute sampling, quality checks, conversation planning, and tool-use simulation in a single synthesize() call

Install

pip install asynth
pip install asynth[hf]    # HuggingFace dataset loading
pip install asynth[docs]  # Document ingestion (PDF, DOCX)

Requires Python >= 3.11.

Features

Data generation

  • Attribute-based synthesis — combine sampled, generated, and transformed attributes in a single pipeline
  • Multi-turn conversations — LLM-powered conversation planning with configurable turn counts and per-role personas
  • Tool-use simulation — generate agentic conversations with tool calls grounded in environment definitions

Data sources

  • Documents — PDF, DOCX, TXT, Markdown, HTML with token-based segmentation
  • Datasets — JSONL, CSV, Parquet, TSV, XLSX, and HuggingFace datasets (hf:org/dataset)

Quality

  • Structural validation — role alternation, empty content, tool-call consistency checks before output
  • LLM-as-a-JudgeSimpleJudge and RuleBasedJudge with 15 pre-built evaluation configs (code quality, safety, truthfulness, etc.)

Infrastructure

  • Provider-agnostic — OpenAI, Anthropic, Google, Azure, Together, Fireworks, Ollama, vLLM via LiteLLM
  • Concurrent generation — async LLM calls with configurable concurrency limits

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asynth-0.1.1.tar.gz (312.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asynth-0.1.1-py3-none-any.whl (118.6 kB view details)

Uploaded Python 3

File details

Details for the file asynth-0.1.1.tar.gz.

File metadata

  • Download URL: asynth-0.1.1.tar.gz
  • Upload date:
  • Size: 312.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ebc71a7eeb91037f0addecd86cd9d034cf71b7a9423a855d4c2559746de73d5
MD5 c8a4106ae2839c2931483c02e8c4b644
BLAKE2b-256 93f9e241fabcf75b158c96ac86bfb416ea865e136ed57b5977c2007080c8e332

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.1.tar.gz:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asynth-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: asynth-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 118.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 974c77941620cde9d96ee1629d13decafd8920e4c3258fd4157e7b5dd829b2b3
MD5 fb250410c7431317b85dbd12e4185165
BLAKE2b-256 db5dc98086de4153c76aac6c7d06a16c7568a507fc8d55ec1e1954a7e61b1e6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.1-py3-none-any.whl:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page