A new Python project.
Project description
coreason-synthesis
Grounded Synthetic Data Generation (SDG) for the CoReason-AI platform.
Overview
coreason-synthesis is the "Amplifier" of the CoReason platform. It solves the "Cold Start Problem" of evaluation by manufacturing high-quality, domain-specific Benchmark Evaluation Corpora (BEC) from a small set of user-provided examples.
Unlike standard GenAI approaches that rely on hallucination, this library implements a Grounded Synthesis Pipeline:
- Learns the testing pattern from user-provided Seeds.
- Forages for real, semantically similar documents via MCP.
- Extracts verbatim text slices (The "Real Data").
- Composites synthetic questions around that data (The "Fake Scenario").
- Appraises and ranks the results by complexity and diversity.
The output is a rigorous, stratified test suite that validates the agent against actual enterprise data variances, not idealized synthetic text.
Features
- Pattern-Forage-Fabricate-Rank Loop: A complete pipeline for generating high-quality test data.
- Few-Shot Intent Inference: Infers testing intent from a few examples.
- Verbatim Defense: Uses pixel-perfect copies of real data (preserving errors/formatting) as context.
- Lineage Transparency: Distinguishes between "Verbatim/Real" and "Adversarial/Perturbed" data.
- Quality Ranking: Appraises and ranks cases by complexity, ambiguity, diversity, and validity.
- Safety & Privacy: Includes PII Sanitization filters.
For detailed requirements and specifications, see docs/product_requirements.md.
Installation
pip install coreason-synthesis
Usage
Here is a concise example of how to initialize and use the library (using built-in mocks for demonstration):
import uuid
from coreason_synthesis.pipeline import SynthesisPipeline
from coreason_synthesis.analyzer import PatternAnalyzerImpl
from coreason_synthesis.forager import ForagerImpl
from coreason_synthesis.extractor import ExtractorImpl
from coreason_synthesis.compositor import CompositorImpl
from coreason_synthesis.perturbator import PerturbatorImpl
from coreason_synthesis.appraiser import AppraiserImpl
from coreason_synthesis.models import SeedCase, Document
# Import mocks for demonstration (replace with real implementations in prod)
from coreason_synthesis.mocks.teacher import MockTeacher
from coreason_synthesis.mocks.embedding import DummyEmbeddingService
from coreason_synthesis.mocks.mcp import MockMCPClient
# 1. Initialize Dependencies
teacher = MockTeacher()
embedder = DummyEmbeddingService()
mcp_client = MockMCPClient(
documents=[
Document(
content="Standard Dose: 50mg. Included: Adults.",
source_urn="doc:1"
)
]
)
# 2. Initialize Components
analyzer = PatternAnalyzerImpl(teacher, embedder)
forager = ForagerImpl(mcp_client, embedder)
extractor = ExtractorImpl()
compositor = CompositorImpl(teacher)
perturbator = PerturbatorImpl()
appraiser = AppraiserImpl(teacher, embedder)
# 3. Assemble Pipeline
pipeline = SynthesisPipeline(
analyzer=analyzer,
forager=forager,
extractor=extractor,
compositor=compositor,
perturbator=perturbator,
appraiser=appraiser
)
# 4. Define Seeds
seeds = [
SeedCase(
id=uuid.uuid4(),
question="Calculate BSA for 180cm, 80kg patient",
expected_output={"bsa": 2.0},
context="Formula: sqrt((height*weight)/3600)"
)
]
# 5. Run Synthesis
config = {
"target_count": 5,
"perturbation_rate": 0.5,
"sort_by": "complexity_desc"
}
user_context = {"user_id": "demo-user"}
results = pipeline.run(seeds, config, user_context)
# 6. Use Results
for case in results:
print(f"Generated Case ({case.provenance}): {case.synthetic_question}")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coreason_synthesis-0.2.0.tar.gz.
File metadata
- Download URL: coreason_synthesis-0.2.0.tar.gz
- Upload date:
- Size: 26.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6616f9b1d77a94e7bc4fed93b5408099091fcded5210a7b3ff56599d4dce81d6
|
|
| MD5 |
9aa2f55811a7eaa4ed398e99a49d11b6
|
|
| BLAKE2b-256 |
5ec8ac6274c1e2b4b06274072053e8850c2bcb3b5e71ec082086648f7737a151
|
Provenance
The following attestation bundles were made for coreason_synthesis-0.2.0.tar.gz:
Publisher:
publish.yml on CoReason-AI/coreason-synthesis
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
coreason_synthesis-0.2.0.tar.gz -
Subject digest:
6616f9b1d77a94e7bc4fed93b5408099091fcded5210a7b3ff56599d4dce81d6 - Sigstore transparency entry: 868294648
- Sigstore integration time:
-
Permalink:
CoReason-AI/coreason-synthesis@d93e58a60beac0d018b730f25385a9bcc7cb252b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/CoReason-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d93e58a60beac0d018b730f25385a9bcc7cb252b -
Trigger Event:
release
-
Statement type:
File details
Details for the file coreason_synthesis-0.2.0-py3-none-any.whl.
File metadata
- Download URL: coreason_synthesis-0.2.0-py3-none-any.whl
- Upload date:
- Size: 39.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4262f470d69ebdce5a934f306894c660ebdd09e2a0ee9673b7481c1cf8aaa50
|
|
| MD5 |
8868d4ced08ac49d3da4438c5228c5f7
|
|
| BLAKE2b-256 |
5989cbb72c9813de1d8a40113f359ab0a7567192010994b06c7ba59bb21b701b
|
Provenance
The following attestation bundles were made for coreason_synthesis-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on CoReason-AI/coreason-synthesis
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
coreason_synthesis-0.2.0-py3-none-any.whl -
Subject digest:
b4262f470d69ebdce5a934f306894c660ebdd09e2a0ee9673b7481c1cf8aaa50 - Sigstore transparency entry: 868294650
- Sigstore integration time:
-
Permalink:
CoReason-AI/coreason-synthesis@d93e58a60beac0d018b730f25385a9bcc7cb252b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/CoReason-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d93e58a60beac0d018b730f25385a9bcc7cb252b -
Trigger Event:
release
-
Statement type: