versifai

Agentic AI framework for autonomous data engineering, science, and storytelling

These details have not been verified by PyPI

Project links

Project description

Versifai

Agentic AI framework for autonomous data engineering, science, and storytelling.

Versifai provides specialized AI agents that automate the complete data lifecycle -from raw file discovery and schema design, through statistical analysis and modeling, to compelling narrative reports. Each agent operates autonomously using a ReAct (Reason-Act-Observe) loop, with human-in-the-loop oversight at every stage.

Built on LiteLLM for multi-provider LLM support (Anthropic, OpenAI, Azure, and 100+ more).

Features
Agent Families
Installation
Quick Start
Usage Examples
Architecture
Building Custom Agents
Configuration
Contributing
License

Features

Autonomous agent loop -ReAct-based agents that reason, act, and observe iteratively until a task is complete
Multi-provider LLM -Swap between Claude, GPT-4, Azure, Gemini, or any LiteLLM-supported provider with a single parameter
Modular tool system -Plug-and-play tools with a shared registry; add your own in minutes
Smart resume -Agents persist state to disk and resume from where they left off after interruption
Run isolation -Each run gets its own directory with metadata, progress logs, and artifacts
Human-in-the-loop -Built-in ask_human tool lets agents pause and request guidance
Databricks native -First-class support for Notebooks, Unity Catalog, Delta tables, and Volumes.

Versifai

See It In Action

Read a full research report produced end-to-end by Versifai's agent pipeline -from raw CMS data ingestion through statistical analysis to narrative output:

CMS Stars Adjustment: An Autonomous Policy Research Report

Agent Families

Family	Agents	What It Does
`versifai.data_agents`	`DataEngineerAgent`, `DataAnalystAgent`	Discover raw files, profile data, design schemas, transform and load into structured tables. The analyst validates quality.
`versifai.science_agents`	`DataScientistAgent`	Autonomous research -builds analytical datasets, runs hypothesis tests, fits models, produces charts and findings.
`versifai.story_agents`	`StoryTellerAgent`	Transforms research findings into evidence-grounded narrative reports with citations, visual references, and editorial review.

Installation

From PyPI

# Install with all runtime dependencies
pip install versifai

# With development tools (ruff, mypy, pytest, pre-commit)
pip install "versifai[dev]"

From Source (development)

git clone https://github.com/jweinberg-a2a/versifai-data-agents.git
cd versifai-data-agents
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Quick Start

1. Set your LLM API key

# Anthropic (default)
export ANTHROPIC_API_KEY="sk-ant-..."

# Or OpenAI
export OPENAI_API_KEY="sk-..."

2. Run a data engineering agent

from versifai.data_agents import DataEngineerAgent, ProjectConfig

cfg = ProjectConfig(
    name="Sales Pipeline",
    catalog="analytics",
    schema="sales",
    volume_path="/Volumes/analytics/sales/raw_data",
)

agent = DataEngineerAgent(cfg=cfg, dbutils=dbutils)
result = agent.run()
print(f"Processed {result['sources_completed']} sources")

3. Run a data science agent

from versifai.science_agents import DataScientistAgent, ResearchConfig

cfg = ResearchConfig(
    name="Customer Churn Analysis",
    catalog="analytics",
    schema="churn",
    results_path="/tmp/results/churn",
    themes=[...],  # Define research themes
)

agent = DataScientistAgent(cfg=cfg, dbutils=dbutils)
result = agent.run()

4. Generate a narrative report

from versifai.story_agents import StoryTellerAgent, StorytellerConfig

cfg = StorytellerConfig(
    name="Churn Analysis Report",
    thesis="Customer churn is driven primarily by...",
    research_results_path="/tmp/results/churn",
    narrative_output_path="/tmp/narrative/churn",
    narrative_sections=[...],  # Define report sections
)

agent = StoryTellerAgent(cfg=cfg, dbutils=dbutils)
result = agent.run()
print(f"Wrote {result['sections_written']} sections")

Usage Examples

Multi-Provider LLM Support

Versifai uses LiteLLM under the hood. Switch providers with a single parameter:

from versifai.core import LLMClient

# Anthropic Claude (default)
llm = LLMClient(model="claude-sonnet-4-6")

# OpenAI GPT-4o
llm = LLMClient(model="gpt-4o")

# Azure OpenAI
llm = LLMClient(
    model="azure/gpt-4o",
    api_base="https://my-endpoint.openai.azure.com",
)

# Google Gemini
llm = LLMClient(model="gemini/gemini-1.5-pro")

# Pass the LLM to any agent
agent = DataEngineerAgent(cfg=cfg, dbutils=dbutils)
agent._llm = llm  # Override the default

Smart Resume

All agents support resuming from interruption:

# First run -gets interrupted at source 3 of 10
agent = DataEngineerAgent(cfg=cfg, dbutils=dbutils)
agent.run()  # Ctrl+C after source 3

# Re-run -automatically picks up from source 4
agent = DataEngineerAgent(cfg=cfg, dbutils=dbutils)
agent.run()  # Skips sources 1-3, continues from 4

Running Specific Sections

Both science and story agents support targeted re-runs:

# Re-run only themes 0 and 3
scientist = DataScientistAgent(cfg=cfg, dbutils=dbutils)
scientist.run_themes(themes=[0, 3])

# Re-run only sections 1 and 2 of the narrative
storyteller = StoryTellerAgent(cfg=cfg, dbutils=dbutils)
storyteller.run_sections(sections=[1, 2])

Editorial Review (Human-in-the-Loop)

The storyteller agent has a dedicated editor mode:

agent = StoryTellerAgent(cfg=cfg, dbutils=dbutils)

# Guided review
agent.run_editor(
    instructions="Simplify the methodology section for a policymaker audience."
)

# Open-ended review
agent.run_editor()

Complete Workflow Example

See examples/ for full end-to-end configurations.

from versifai.data_agents import DataEngineerAgent
from versifai.science_agents import DataScientistAgent
from versifai.story_agents import StoryTellerAgent

# Step 1: Engineer ingests raw data
engineer = DataEngineerAgent(cfg=engineer_cfg, dbutils=dbutils)
engineer.run()

# Step 2: Scientist analyzes the data
scientist = DataScientistAgent(cfg=science_cfg, dbutils=dbutils)
scientist.run()

# Step 3: Storyteller writes the report
storyteller = StoryTellerAgent(cfg=story_cfg, dbutils=dbutils)
storyteller.run()

Architecture

src/versifai/
├── core/                  # Shared agentic framework
│   ├── agent.py           # BaseAgent -ReAct loop engine
│   ├── llm.py             # LLMClient -multi-provider via LiteLLM
│   ├── memory.py          # AgentMemory -conversation + carryover context
│   ├── display.py         # AgentDisplay -rich progress output
│   ├── config.py          # CatalogConfig, AgentSettings
│   ├── run_manager.py     # Run isolation + state persistence
│   └── tools/             # Shared tools (BaseTool, ToolRegistry, etc.)
│
├── data_agents/           # Data engineering & analysis
│   ├── engineer/          # DataEngineerAgent + planning + tools
│   ├── analyst/           # DataAnalystAgent (quality validation)
│   └── models/            # FileInfo, TargetSchema, AgentState
│
├── science_agents/        # Data science & research
│   └── scientist/         # DataScientistAgent + analysis tools
│
├── story_agents/          # Narrative & storytelling
│   └── storyteller/       # StoryTellerAgent + narrative tools
│
└── _utils/                # Internal utilities (naming, FIPS codes)

Key Design Patterns

BaseAgent -All agents subclass BaseAgent, which provides the ReAct loop, error recovery, and tool dispatch
ToolRegistry -Tools are registered at construction time; the agent's loop automatically matches LLM tool calls to registered tools
BaseTool -Every tool implements name, description, parameters_schema, and execute(). Drop-in replaceable.
AgentMemory -Manages conversation history with automatic summarization for long-running tasks

Building Custom Agents

Create a Custom Tool

from versifai.core import BaseTool, ToolResult

class FetchWeatherTool(BaseTool):
    name = "fetch_weather"
    description = "Fetch current weather for a city"
    parameters_schema = {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
        },
        "required": ["city"],
    }

    def execute(self, city: str) -> ToolResult:
        # Your implementation here
        data = call_weather_api(city)
        return ToolResult(success=True, data=data)

Create a Custom Agent

from versifai.core import (
    BaseAgent, LLMClient, AgentMemory, AgentDisplay, ToolRegistry,
)

class WeatherAgent(BaseAgent):
    def __init__(self):
        registry = ToolRegistry()
        registry.register(FetchWeatherTool())

        super().__init__(
            display=AgentDisplay(),
            memory=AgentMemory(),
            llm=LLMClient(model="gpt-4o"),
            registry=registry,
        )
        self._system_prompt = "You are a helpful weather assistant."

    def ask(self, question: str) -> str:
        return self._run_phase(prompt=question, max_turns=10)

# Use it
agent = WeatherAgent()
answer = agent.ask("What's the weather in San Francisco?")

Where to Put Your Code

What you're adding	Where it goes
A tool used by multiple agent families	`src/versifai/core/tools/`
A tool specific to one agent	`src/versifai/<family>/<agent>/tools/`
A new agent in an existing family	`src/versifai/<family>/<new_agent>/`
A new agent family	`src/versifai/<new_family>/`
Shared config or data models	`src/versifai/core/config.py` or `src/versifai/<family>/models/`
Internal helpers	`src/versifai/_utils/`

Configuration

CatalogConfig (shared)

All agents that interact with Databricks Unity Catalog use CatalogConfig:

from versifai.core import CatalogConfig

catalog = CatalogConfig(
    catalog="my_catalog",
    schema="my_schema",
    volume_path="/Volumes/my_catalog/my_schema/data",
    staging_path="/Volumes/my_catalog/my_schema/staging",
)

AgentSettings (shared)

Tune agent behavior globally:

from versifai.core import AgentSettings

settings = AgentSettings(
    max_agent_turns=200,        # Max ReAct iterations per run
    max_turns_per_source=120,   # Max turns per data source
    max_acceptance_iterations=3, # Validation retry limit
    sample_rows=10,             # Rows shown in profiling previews
)

Environment Variables

Variable	Purpose	Required
`ANTHROPIC_API_KEY`	Anthropic Claude API key	If using Claude
`OPENAI_API_KEY`	OpenAI API key	If using GPT models
`DATABRICKS_HOST`	Databricks workspace URL	For catalog operations
`DATABRICKS_TOKEN`	Databricks PAT	For catalog operations

Contributing

We welcome contributions! See CONTRIBUTING.md for the full guide.

Quick Start for Contributors

git clone https://github.com/jweinberg-a2a/versifai-data-agents.git
cd versifai-data-agents
python -m venv .venv && source .venv/bin/activate
make install-dev   # installs with all deps + pre-commit hooks
make test          # run tests
make lint          # check code style
make format        # auto-format code

Where to Contribute

New tools -The easiest way to contribute. Subclass BaseTool, implement execute(), and submit a PR. See Building Custom Agents for the pattern.
New agents -Add a new agent type to an existing family or propose a new family.
LLM provider support -We use LiteLLM, so most providers work out of the box. If you find one that doesn't, help us fix it.
Documentation and examples -Add example configs in examples/ for your domain.
Bug fixes and tests -Always appreciated.

License

Business Source License 1.1. Free to use, modify, and extend for non-commercial purposes. See LICENSE for full terms.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Feb 27, 2026

This version

0.1.1

Feb 26, 2026

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

versifai-0.1.1.tar.gz (702.8 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

versifai-0.1.1-py3-none-any.whl (290.3 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file versifai-0.1.1.tar.gz.

File metadata

Download URL: versifai-0.1.1.tar.gz
Upload date: Feb 26, 2026
Size: 702.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for versifai-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`72df445ac2c704db38c38ecbe73af9e6764da739d424fce7b1ef2933306c2bd7`
MD5	`cd544fb7942f3bedb23e10c91f05f267`
BLAKE2b-256	`0fd2cb92263aab35a784c02728d219229f548e22dadd8a346fee54eb7a09eed2`

See more details on using hashes here.

File details

Details for the file versifai-0.1.1-py3-none-any.whl.

File metadata

Download URL: versifai-0.1.1-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 290.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for versifai-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49e0e72efec5182a4ea0e61a2155cd2ab5016944f09a19405f98c1ca6c56574c`
MD5	`a638bd474d7927294284316f78ae9887`
BLAKE2b-256	`3d0e916f1d1d131abbc94dc4284729ca4f5004d303a8eea752c67c9de299910d`

See more details on using hashes here.

versifai 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Table of Contents

Features

See It In Action

Agent Families

Installation

From PyPI

From Source (development)

Quick Start

1. Set your LLM API key

2. Run a data engineering agent

3. Run a data science agent

4. Generate a narrative report

Usage Examples

Multi-Provider LLM Support

Smart Resume

Running Specific Sections

Editorial Review (Human-in-the-Loop)

Complete Workflow Example

Architecture

Key Design Patterns

Building Custom Agents

Create a Custom Tool

Create a Custom Agent

Where to Put Your Code

Configuration

CatalogConfig (shared)

AgentSettings (shared)

Environment Variables

Contributing

Quick Start for Contributors

Where to Contribute

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes