Skip to main content

Synthetic data generation engine for building task models

Project description

asynth

Generate targeted training data to replace expensive LLM API calls with fast, specialized models.

Python 3.11+ License: Apache 2.0

from asynth import synthesize, SynthesisConfig, LiteLLMInferenceConfig
from asynth.configs import GeneralSynthesisParams
from asynth.configs.params.synthesis_params import GeneratedAttribute, TextMessage
from asynth.types.conversation import Role

results = synthesize(SynthesisConfig(
    num_samples=10,
    inference_config=LiteLLMInferenceConfig(model="openai/gpt-4o-mini"),
    strategy_params=GeneralSynthesisParams(
        generated_attributes=[
            GeneratedAttribute(
                id="qa_pair",
                instruction_messages=[
                    TextMessage(role=Role.SYSTEM, content="You are a trivia question writer."),
                    TextMessage(role=Role.USER, content="Write a trivia Q&A about science."),
                ],
            ),
        ],
    ),
))

[!NOTE] asynth is the data engine behind amortized — a platform for building and deploying task models that replace expensive LLM API calls with fast, cheap, specialized inference.

Why asynth?

Large models are expensive to run on every request. The alternative: generate synthetic training data, fine-tune a small purpose-built model, and amortize the cost over time.

  • Build task models — small models that do one thing well, at a fraction of the cost
  • Any LLM as teacher — use GPT-4o, Claude, Gemini, or any LiteLLM provider to generate data — just change the model string
  • No heavy dependencies — no torch, no transformers, no CUDA. Installs in seconds
  • Production pipeline — attribute sampling, quality checks, conversation planning, and tool-use simulation in a single synthesize() call

Install

pip install asynth
pip install asynth[hf]    # HuggingFace dataset loading
pip install asynth[docs]  # Document ingestion (PDF, DOCX)

Requires Python >= 3.11.

Features

Data generation

  • Attribute-based synthesis — combine sampled, generated, and transformed attributes in a single pipeline
  • Multi-turn conversations — LLM-powered conversation planning with configurable turn counts and per-role personas
  • Tool-use simulation — generate agentic conversations with tool calls grounded in environment definitions

Data sources

  • Documents — PDF, DOCX, TXT, Markdown, HTML with token-based segmentation
  • Datasets — JSONL, CSV, Parquet, TSV, XLSX, and HuggingFace datasets (hf:org/dataset)

Quality

  • Structural validation — role alternation, empty content, tool-call consistency checks before output
  • LLM-as-a-JudgeSimpleJudge and RuleBasedJudge with 15 pre-built evaluation configs (code quality, safety, truthfulness, etc.)

Infrastructure

  • Provider-agnostic — OpenAI, Anthropic, Google, Azure, Together, Fireworks, Ollama, vLLM via LiteLLM
  • Concurrent generation — async LLM calls with configurable concurrency limits

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asynth-0.1.3.tar.gz (316.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asynth-0.1.3-py3-none-any.whl (119.6 kB view details)

Uploaded Python 3

File details

Details for the file asynth-0.1.3.tar.gz.

File metadata

  • Download URL: asynth-0.1.3.tar.gz
  • Upload date:
  • Size: 316.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0385973c1615ec37dbfccc027b81f379bb4404e56f7ca2241386d25f85f37462
MD5 900f5512b20d70bce93e182d49afd9ab
BLAKE2b-256 393e3378c6c0c3c5ad8b1def069aecc1eb6d2c70bb6fa63cbd958f50b72f6d28

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.3.tar.gz:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asynth-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: asynth-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 119.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 99b5c7cb552999a9e5b52c8bbf2bfa05747ef441e63be9593616308b3d9c39dd
MD5 fad21c6951c4b5a4f1fb6d752deca726
BLAKE2b-256 200d0b5c2975010a4fdd3a62e166a7d2e973d7b18d1c9df796f0ac4573248561

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.3-py3-none-any.whl:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page