Skip to main content

Synthetic data generation engine for building task models

Project description

asynth

Generate targeted training data to replace expensive LLM API calls with fast, specialized models.

Python 3.11+ License: Apache 2.0

from asynth import synthesize, SynthesisConfig, LiteLLMInferenceConfig
from asynth.configs import GeneralSynthesisParams
from asynth.configs.params.synthesis_params import GeneratedAttribute, TextMessage
from asynth.types.conversation import Role

results = synthesize(SynthesisConfig(
    num_samples=10,
    inference_config=LiteLLMInferenceConfig(model="openai/gpt-4o-mini"),
    strategy_params=GeneralSynthesisParams(
        generated_attributes=[
            GeneratedAttribute(
                id="qa_pair",
                instruction_messages=[
                    TextMessage(role=Role.SYSTEM, content="You are a trivia question writer."),
                    TextMessage(role=Role.USER, content="Write a trivia Q&A about science."),
                ],
            ),
        ],
    ),
))

[!NOTE] asynth is the data engine behind amortized — a platform for building and deploying task models that replace expensive LLM API calls with fast, cheap, specialized inference.

Why asynth?

Large models are expensive to run on every request. The alternative: generate synthetic training data, fine-tune a small purpose-built model, and amortize the cost over time.

  • Build task models — small models that do one thing well, at a fraction of the cost
  • Any LLM as teacher — use GPT-4o, Claude, Gemini, or any LiteLLM provider to generate data — just change the model string
  • No heavy dependencies — no torch, no transformers, no CUDA. Installs in seconds
  • Production pipeline — attribute sampling, quality checks, conversation planning, and tool-use simulation in a single synthesize() call

Install

pip install asynth
pip install asynth[hf]    # HuggingFace dataset loading
pip install asynth[docs]  # Document ingestion (PDF, DOCX)

Requires Python >= 3.11.

Features

Data generation

  • Attribute-based synthesis — combine sampled, generated, and transformed attributes in a single pipeline
  • Multi-turn conversations — LLM-powered conversation planning with configurable turn counts and per-role personas
  • Tool-use simulation — generate agentic conversations with tool calls grounded in environment definitions

Data sources

  • Documents — PDF, DOCX, TXT, Markdown, HTML with token-based segmentation
  • Datasets — JSONL, CSV, Parquet, TSV, XLSX, and HuggingFace datasets (hf:org/dataset)

Quality

  • Structural validation — role alternation, empty content, tool-call consistency checks before output
  • LLM-as-a-JudgeSimpleJudge and RuleBasedJudge with 15 pre-built evaluation configs (code quality, safety, truthfulness, etc.)

Infrastructure

  • Provider-agnostic — OpenAI, Anthropic, Google, Azure, Together, Fireworks, Ollama, vLLM via LiteLLM
  • Concurrent generation — async LLM calls with configurable concurrency limits

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asynth-0.1.2.tar.gz (315.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asynth-0.1.2-py3-none-any.whl (119.8 kB view details)

Uploaded Python 3

File details

Details for the file asynth-0.1.2.tar.gz.

File metadata

  • Download URL: asynth-0.1.2.tar.gz
  • Upload date:
  • Size: 315.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.2.tar.gz
Algorithm Hash digest
SHA256 685ef2e000bbefe5bbdec1796f12886100471ef7450c3d6c32f9d3a949d39712
MD5 5e64a1c3173efeef899e72ab287c6af5
BLAKE2b-256 cfc28441279fc94d8f76bd4bc4a5b5407f77ccc3cfdcb9ef0039eb4b3aa63bbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.2.tar.gz:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asynth-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: asynth-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 119.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asynth-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1123d64194c71d88fc1e5db78a5024b2896295d9088fe55b639fab8b4798ce47
MD5 a2c459d675b60f723195a236c2e2c648
BLAKE2b-256 9616158410118e20efa8a5cfbab28b54e7054319cc0b682268a866064486b004

See more details on using hashes here.

Provenance

The following attestation bundles were made for asynth-0.1.2-py3-none-any.whl:

Publisher: publish.yml on amortized-ai/asynth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page