Generate realistic multi-agent workflow traces with LLM-enriched content, semantic validation, and PM4Py compatibility

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

juliensimon

These details have not been verified by PyPI

Project links

Dataset

Project description

ocelgen

Generate realistic multi-agent workflow traces on demand. Any domain, any pattern, any LLM. Validated against OCEL 2.0 and PM4Py.

pip install open-agent-traces

ocelgen generate --pattern sequential --runs 50 --noise 0.2 --seed 42

1,500+ events in under 2 seconds. No API key needed for structural traces.

run-0000: "My order arrived damaged, what are my options?"
├── run_started                                              08:00:00.007
├── agent_invoked          researcher    gpt-4o              08:00:00.052
│   ├── llm_request_sent   "Search for refund policy..."     08:00:00.067
│   ├── llm_response       "The refund policy states..."     08:00:00.749
│   ├── tool_called        web_search    → policy found      08:00:01.705
│   └── tool_called        file_reader   → order history     08:00:01.898
├── agent_invoked          analyst       gpt-4o              08:00:02.281
│   ├── llm_request_sent   "Analyze refund eligibility..."   08:00:02.334
│   ├── llm_response       "Customer is eligible for..."     08:00:06.747
│   └── tool_called        calculator    → refund amount     08:00:08.819
├── agent_invoked          summarizer    claude-3.5-sonnet   08:00:09.680
│   ├── llm_request_sent   "Draft resolution response..."    08:00:09.717
│   └── llm_response       "Dear customer, we apologize..."  08:00:10.363
└── run_completed                                            08:00:10.369
    cost: $0.038 | 3,950 input + 2,516 output tokens | 5 LLM calls | 3 tool calls

What it generates

Each trace includes LLM prompts and completions, tool call inputs and outputs, agent reasoning chains, inter-agent messages, calibrated token counts, realistic timestamps, and cost estimates — the same data you'd see in LangSmith, Arize, or Braintrust.

3 workflow patterns:

Sequential:    Research → Analyze → Summarize
Supervisor:    Supervisor → [Worker A, Worker B, Worker C] → Aggregate
Parallel:      Split → [Worker A ‖ Worker B ‖ Worker C] → Aggregate

10 deviation types with ground-truth labels for anomaly detection: skipped steps, wrong tools, swapped order, timeouts, missing handoffs, extra LLM calls, wrong routing, repeated activities, inserted activities, wrong resources.

10 built-in enterprise domains — or define your own in YAML:

Domain	Pattern	What it simulates
`customer-support-triage`	sequential	Classify ticket, research KB, draft response
`code-review-pipeline`	supervisor	Delegate to linter, security reviewer, style checker
`incident-response`	supervisor	Route to diagnostics, mitigation, communications
`data-pipeline-debugging`	supervisor	Log analyzer, schema checker, fix proposer
`market-research`	parallel	Competitor analyst, trend researcher, report writer
`content-generation`	parallel	Researcher, writer, editor working concurrently
`academic-paper-review`	parallel	Methodology, novelty, writing reviewers
`legal-document-analysis`	sequential	Extract clauses, check compliance, summarize risks
`financial-analysis`	sequential	Gather filings, compute ratios, write investment memo
`ecommerce-product-enrichment`	sequential	Scrape specs, normalize attributes, generate descriptions

Enrich with any LLM

Plug in any OpenAI-compatible endpoint to fill traces with realistic content:

# Cloud (OpenRouter — default)
export OPENAI_API_KEY="your-key"
ocelgen enrich output.jsonocel --domain customer-support-triage

# Local (llama.cpp, Ollama, vLLM — no API key needed)
ocelgen enrich output.jsonocel -d customer-support-triage \
  --model local-model --base-url http://localhost:8080/v1

# Full pipeline: generate + enrich + upload to Hugging Face
ocelgen pipeline --domain customer-support-triage --namespace your-hf-username

Enrichment chains context across agent steps, reflects deviations in the generated content, recalibrates token counts and timestamps, and expands seed queries via LLM for diversity across runs.

Validated, not just generated

Every trace is checked by 5 validation layers — tested across all 10 domains, all 3 patterns, and the live HF dataset:

Validator	What it checks
JSON Schema	OCEL 2.0 structural compliance
Referential integrity	Every relationship points to an existing object
Type attributes	Every attribute matches its declared type schema
Temporal ordering	Causal pairs in order, run boundaries correct
Workflow conformance	Conformant runs follow the template (parallel-aware)

from ocelgen.generation.engine import generate
from ocelgen.validation import (
    validate_referential_integrity,
    validate_workflow_conformance,
)

result = generate("sequential", num_runs=50, noise_rate=0.3, seed=42)
assert validate_referential_integrity(result.log) == []
assert validate_workflow_conformance(result.log, result.template) == []

Traces load directly in PM4Py — the reference OCEL 2.0 process mining library:

pip install open-agent-traces[conformance]

import pm4py
ocel = pm4py.read.read_ocel2_json("output.jsonocel")

Define your own domains

Create custom domains in YAML — they merge with the 10 built-ins:

domains:
  - name: "hr-onboarding"
    description: "HR onboarding: collect docs, run checks, provision access"
    pattern: "sequential"
    runs: 30
    noise: 0.15
    seed: 50001
    user_queries:
      - "New hire starting March 15 as Senior Engineer"
    agent_personas:
      researcher: "You are an HR coordinator collecting new hire documentation"
      analyst: "You are a compliance officer verifying background checks"
      summarizer: "You are an IT provisioner setting up accounts and access"
    tool_descriptions:
      web_search: "Search HR knowledge base for onboarding checklists"
      file_reader: "Read employee records and compliance documents"

ocelgen pipeline --domain hr-onboarding --config domains.yaml --namespace your-hf-username

Pre-built dataset

Don't want to generate? Load 17,000+ events directly from Hugging Face:

from datasets import load_dataset

ds = load_dataset("juliensimon/open-agent-traces", "incident-response")

for event in ds["train"]:
    if event["run_id"] == "run-0000":
        print(f"{event['event_type']:25s} | {event['agent_role']:12s} | {event['reasoning'][:60] if event['reasoning'] else ''}")

Who is this for?

Agent observability teams — build and test monitoring dashboards with realistic trace data
ML researchers — train anomaly detectors on labeled conformant vs deviant traces
Process mining researchers — apply OCEL 2.0 conformance checking to multi-agent systems
Agent framework developers — test LangGraph, CrewAI, AutoGen, Smolagents pipelines
Evaluation teams — benchmark agent reasoning quality across domains and architectures

Examples

Script	What it shows
`basic_generation.py`	Generate logs via Python API, inspect results, write files
`validate_traces.py`	Run all 5 semantic validators across all 3 patterns
`inspect_run.py`	Walk a single run's event timeline, LLM calls, tools, costs
`explore_with_pm4py.py`	Download from HF, query with pm4py and datasets library
`conformance_demo.py`	Generate and load with pm4py

Documentation

Quick Start — first dataset in 5 minutes
User Guide — CLI reference, patterns, domains, custom YAML, validation, PM4Py
Dataset on HF — 17,000+ events across 10 domains

Development

git clone https://github.com/juliensimon/ocel-generator.git && cd ocel-generator
uv sync --extra dev
uv run pre-commit install   # ruff + mypy + pytest on every commit
uv run pytest               # 265 tests, 98% coverage

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

juliensimon

These details have not been verified by PyPI

Project links

Dataset

Release history Release notifications | RSS feed

This version

0.2.0

Apr 7, 2026

0.1.0

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

open_agent_traces-0.2.0.tar.gz (490.9 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

open_agent_traces-0.2.0-py3-none-any.whl (68.9 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file open_agent_traces-0.2.0.tar.gz.

File metadata

Download URL: open_agent_traces-0.2.0.tar.gz
Upload date: Apr 7, 2026
Size: 490.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for open_agent_traces-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`7820625b7628b10979f3b6bed099fb3ccf241e07651b9ab305686a7fdd417cc4`
MD5	`032f7a3792e56960130167d8367ec0b1`
BLAKE2b-256	`11c0a3c88108ced2fc81bb0464ee8608102899ae942cc475e0095ebe79239ee6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for open_agent_traces-0.2.0.tar.gz:

Publisher: publish.yml on juliensimon/ocel-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: open_agent_traces-0.2.0.tar.gz
- Subject digest: 7820625b7628b10979f3b6bed099fb3ccf241e07651b9ab305686a7fdd417cc4
- Sigstore transparency entry: 1248459600
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: juliensimon/ocel-generator@ce6a0a58a0c5816eca0aeb2a6f076b8b4b79fe2a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/juliensimon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ce6a0a58a0c5816eca0aeb2a6f076b8b4b79fe2a
- Trigger Event: push

File details

Details for the file open_agent_traces-0.2.0-py3-none-any.whl.

File metadata

Download URL: open_agent_traces-0.2.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 68.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for open_agent_traces-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1fdb087ab3327c6fa3492e3b69fe9770a5f44c6b4e1a5a107ddade79ccf724f7`
MD5	`60c91356545344df0bf6eb95279a99ee`
BLAKE2b-256	`24cd6185aa01a2a00280a688b25fdba698ba5418519a4c31474ab36671c476b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for open_agent_traces-0.2.0-py3-none-any.whl:

Publisher: publish.yml on juliensimon/ocel-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: open_agent_traces-0.2.0-py3-none-any.whl
- Subject digest: 1fdb087ab3327c6fa3492e3b69fe9770a5f44c6b4e1a5a107ddade79ccf724f7
- Sigstore transparency entry: 1248459650
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: juliensimon/ocel-generator@ce6a0a58a0c5816eca0aeb2a6f076b8b4b79fe2a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/juliensimon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ce6a0a58a0c5816eca0aeb2a6f076b8b4b79fe2a
- Trigger Event: push

open-agent-traces 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ocelgen

What it generates

Enrich with any LLM

Validated, not just generated

Define your own domains

Pre-built dataset

Who is this for?

Examples

Documentation

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance