A new Python project.

These details have not been verified by PyPI

Project links

License
- Other/Proprietary License
Operating System
- OS Independent
Programming Language
- Python :: 3.12

Project description

coreason-synthesis

Grounded Synthetic Data Generation (SDG) for the CoReason-AI platform.

Overview

coreason-synthesis is the "Amplifier" of the CoReason platform. It solves the "Cold Start Problem" of evaluation by manufacturing high-quality, domain-specific Benchmark Evaluation Corpora (BEC) from a small set of user-provided examples.

Unlike standard GenAI approaches that rely on hallucination, this library implements a Grounded Synthesis Pipeline:

Learns the testing pattern from user-provided Seeds.
Forages for real, semantically similar documents via MCP.
Extracts verbatim text slices (The "Real Data").
Composites synthetic questions around that data (The "Fake Scenario").
Appraises and ranks the results by complexity and diversity.

The output is a rigorous, stratified test suite that validates the agent against actual enterprise data variances, not idealized synthetic text.

Features

Pattern-Forage-Fabricate-Rank Loop: A complete pipeline for generating high-quality test data.
Server Mode: Run as a high-concurrency FastAPI microservice.
Docker Ready: Containerized for production deployment.
Few-Shot Intent Inference: Infers testing intent from a few examples.
Verbatim Defense: Uses pixel-perfect copies of real data (preserving errors/formatting) as context.
Lineage Transparency: Distinguishes between "Verbatim/Real" and "Adversarial/Perturbed" data.
Quality Ranking: Appraises and ranks cases by complexity, ambiguity, diversity, and validity.
Safety & Privacy: Includes PII Sanitization filters.

For detailed requirements and specifications, see docs/product_requirements.md.

Installation

pip install coreason-synthesis

Running as a Service

You can run coreason-synthesis as a standalone microservice using Docker or directly via Uvicorn.

1. Docker (Recommended)

# Build the image
docker build -t coreason-synthesis .

# Run the container
docker run -p 8000:8000 \
  -e OPENAI_API_KEY="sk-..." \
  -e MCP_BASE_URL="http://mcp-service:8080" \
  coreason-synthesis

2. Manual Execution

# Install dependencies
pip install coreason-synthesis[server]

# Export credentials
export OPENAI_API_KEY="sk-..."
export MCP_BASE_URL="http://localhost:8080"

# Start the server
uvicorn coreason_synthesis.server:app --host 0.0.0.0 --port 8000

Usage (Library Mode)

Here is a concise example of how to initialize and use the library (using built-in mocks for demonstration):

import uuid
from coreason_synthesis.pipeline import SynthesisPipeline
from coreason_synthesis.analyzer import PatternAnalyzerImpl
from coreason_synthesis.forager import ForagerImpl
from coreason_synthesis.extractor import ExtractorImpl
from coreason_synthesis.compositor import CompositorImpl
from coreason_synthesis.perturbator import PerturbatorImpl
from coreason_synthesis.appraiser import AppraiserImpl
from coreason_synthesis.models import SeedCase, Document

# Import mocks for demonstration (replace with real implementations in prod)
from coreason_synthesis.mocks.teacher import MockTeacher
from coreason_synthesis.mocks.embedding import DummyEmbeddingService
from coreason_synthesis.mocks.mcp import MockMCPClient

# 1. Initialize Dependencies
teacher = MockTeacher()
embedder = DummyEmbeddingService()
mcp_client = MockMCPClient(
    documents=[
        Document(
            content="Standard Dose: 50mg. Included: Adults.",
            source_urn="doc:1"
        )
    ]
)

# 2. Initialize Components
analyzer = PatternAnalyzerImpl(teacher, embedder)
forager = ForagerImpl(mcp_client, embedder)
extractor = ExtractorImpl()
compositor = CompositorImpl(teacher)
perturbator = PerturbatorImpl()
appraiser = AppraiserImpl(teacher, embedder)

# 3. Assemble Pipeline
pipeline = SynthesisPipeline(
    analyzer=analyzer,
    forager=forager,
    extractor=extractor,
    compositor=compositor,
    perturbator=perturbator,
    appraiser=appraiser
)

# 4. Define Seeds
seeds = [
    SeedCase(
        id=uuid.uuid4(),
        question="Calculate BSA for 180cm, 80kg patient",
        expected_output={"bsa": 2.0},
        context="Formula: sqrt((height*weight)/3600)"
    )
]

# 5. Run Synthesis
config = {
    "target_count": 5,
    "perturbation_rate": 0.5,
    "sort_by": "complexity_desc"
}
user_context = {"user_id": "demo-user"}

results = pipeline.run(seeds, config, user_context)

# 6. Use Results
for case in results:
    print(f"Generated Case ({case.provenance}): {case.synthetic_question}")

Project details

These details have not been verified by PyPI

Project links

License
- Other/Proprietary License
Operating System
- OS Independent
Programming Language
- Python :: 3.12

Release history Release notifications | RSS feed

This version

0.3.0

Jan 30, 2026

0.2.1

Jan 28, 2026

0.2.0

Jan 28, 2026

0.1.0

Jan 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coreason_synthesis-0.3.0.tar.gz (29.5 kB view details)

Uploaded Jan 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

coreason_synthesis-0.3.0-py3-none-any.whl (43.8 kB view details)

Uploaded Jan 30, 2026 Python 3

File details

Details for the file coreason_synthesis-0.3.0.tar.gz.

File metadata

Download URL: coreason_synthesis-0.3.0.tar.gz
Upload date: Jan 30, 2026
Size: 29.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for coreason_synthesis-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`8326b948f7c15a7b7545f5d8a2de8c43af3f15c5939b3b5453640826bee7237f`
MD5	`309a3e7bb7cb9783694d2f8abb86316b`
BLAKE2b-256	`e403756e7d07452eca672ecd9ab2c8a8a153ce691d2d2f5037c1cc706d99ace4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreason_synthesis-0.3.0.tar.gz:

Publisher: publish.yml on CoReason-AI/coreason-synthesis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coreason_synthesis-0.3.0.tar.gz
- Subject digest: 8326b948f7c15a7b7545f5d8a2de8c43af3f15c5939b3b5453640826bee7237f
- Sigstore transparency entry: 872224756
- Sigstore integration time: Jan 30, 2026
Source repository:
- Permalink: CoReason-AI/coreason-synthesis@6a304079c4a37030ae60dd7fb930ea1dabbdff44
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/CoReason-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6a304079c4a37030ae60dd7fb930ea1dabbdff44
- Trigger Event: release

File details

Details for the file coreason_synthesis-0.3.0-py3-none-any.whl.

File metadata

Download URL: coreason_synthesis-0.3.0-py3-none-any.whl
Upload date: Jan 30, 2026
Size: 43.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for coreason_synthesis-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b299766dbd12f3b278c641556ba4e6fee6877acf3fb8c74d760c2d8d5321f7c`
MD5	`3c3b38e4012e7f8cb60ac036df9dfc63`
BLAKE2b-256	`db98b8cb99b85cc09640dda0480d0ee9585761c4219d25c7c689cd7748cfc1a5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreason_synthesis-0.3.0-py3-none-any.whl:

Publisher: publish.yml on CoReason-AI/coreason-synthesis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coreason_synthesis-0.3.0-py3-none-any.whl
- Subject digest: 3b299766dbd12f3b278c641556ba4e6fee6877acf3fb8c74d760c2d8d5321f7c
- Sigstore transparency entry: 872224767
- Sigstore integration time: Jan 30, 2026
Source repository:
- Permalink: CoReason-AI/coreason-synthesis@6a304079c4a37030ae60dd7fb930ea1dabbdff44
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/CoReason-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6a304079c4a37030ae60dd7fb930ea1dabbdff44
- Trigger Event: release

coreason-synthesis 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

coreason-synthesis

Overview

Features

Installation

Running as a Service

1. Docker (Recommended)

2. Manual Execution

Usage (Library Mode)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance