A tool-based DS-Star agent implementation using LangGraph

These details have not been verified by PyPI

Project links

Project description

OpenDsStar

OpenDsStar is an open-source implementation of the DS-Star agent (Nam et al., 2025), with several deliberate design enhancements that improve modularity, extensibility, and execution efficiency.

The original DS-Star agent is primarily built around file-based artifacts: reasoning, planning, and execution revolve around reading, writing, and modifying files that represent intermediate and final results. OpenDsStar preserves the core planning-and-coding philosophy of DS-Star, but redefines the execution model around a tool-centric abstraction.

DS-Star is a Programmatic Tool Calling (PTC) agent. Rather than reasoning directly over files, OpenDsStar plans and executes workflows by composing explicit tool invocations, drawing inspiration from the ReAct and CodeAct paradigms. Tools can encapsulate file access, database queries, API calls, external services, computation engines, or arbitrary custom functions. This decouples the agent’s reasoning logic from the underlying execution environment and storage format.

This design generalizes the DS-Star approach beyond data-science-specific workflows. Any task that can be expressed as a sequence of tool calls—whether it involves data processing, information retrieval, programmatic reasoning, or system interaction—can be handled without changing the agent’s core structure.

OpenDsStar also introduces more flexible execution control than the original DS-Star.

In the original design, the agent typically re-executes the entire planned workflow from the beginning whenever the plan is revised, even if earlier steps have already completed successfully. This can be wasteful when early steps involve expensive computation, slow external calls, or large-scale data processing.

OpenDsStar explicitly separates planning from execution and supports incremental, stepwise execution. In this mode, completed steps produce persistent intermediate results and are not re-run. When the plan is extended or refined, only the newly introduced step is executed, while previous outputs are reused. This significantly reduces redundant computation and makes the agent more practical for workflows in which individual steps are costly, long-running, or stateful.

Summary of Design Enhancements

Aspect	Original DS-Star	OpenDsStar
Core abstraction	Files	Tools
Planning representation	File/code actions	Tool-call sequences
Scope	Data-science focused	General purpose
Execution strategy	Re-run full plan	Incremental execution
Intermediate results	Recomputed	Persisted and reused
Planning vs. execution	Coupled	Explicitly separated
Extensibility	File-centric	Tool-based

Features

Programmatic Tool Calling (PTC): Plans are represented as sequences of tool invocations rather than direct file manipulation
Explicit multi-step reasoning: Complex tasks are decomposed into structured, inspectable plans
Code generation and execution: Generates and runs code when needed
Stepwise execution mode: Executes plans incrementally while reusing intermediate outputs
Full execution mode: Runs the entire plan end-to-end, mirroring the original DS-Star behavior
Error handling and recovery: Failed steps are debugged and retried automatically
Result verification: Outputs are validated before returning final answers
LLM-agnostic: Works with OpenAI, Anthropic, Azure, WatsonX, Ollama, and more through LiteLLM

Execution Modes

OpenDsStar supports two execution modes:

Full mode Plans and executes the entire workflow end-to-end, closely matching the original DS-Star execution model.
Stepwise mode Produces plans incrementally and executes only the newest step while reusing outputs from previous steps. This mode is more efficient when steps are expensive or computationally heavy.

Real-World Analysis Examples

OpenDsStar autonomously plans, codes, executes, and interprets complex data analysis — from trend detection and survival analysis to cross-dataset joins — without being told how.

Given 80 CSV files from DataBench, OpenDsStar (powered by Claude Opus 4.6) answers questions like these:

Q: Is there a trend in tornado severity over time?

The agent autonomously analyzed 67,558 tornado records spanning 1950-2021, ran linear regression, and found a statistically significant decreasing trend (r = -0.89, p ~ 4.5 x 10^-26) — while critically noting that improved detection of weak tornadoes largely drives this apparent decline:

Decade	Mean Magnitude	Tornado Count
1950s	1.31	4,793
1970s	1.09	8,579
1990s	0.56	12,137
2010s	0.61	11,629

Q: What is the average age of Forbes billionaires from countries that also have at least 5 FIFA players with an overall rating above 85?

This requires joining two completely separate datasets (Forbes billionaires and FIFA players) on a shared dimension (country) — without being told which files to use or how to combine them. OpenDsStar identified the 6 qualifying countries (Argentina, Brazil, England, France, Germany, Spain), matched them across datasets, and computed the answer: ~66.4 years across 231 billionaires.

See Real-World Analysis Examples for more examples including survival analysis and health analytics.

Installation

From PyPI

pip install opendsstar

From Source

git clone https://github.com/IBM/OpenDsStar.git
cd OpenDsStar
uv sync

Configuration

Create a .env file with your API keys (only include keys for providers you'll use):

OPENAI_API_KEY=your_key_here
# Add other provider keys as needed

See Installation Guide for detailed setup instructions, environment variables, and troubleshooting.

Quick Start

OpenDsStarAgent (DS-Star Implementation)

from dotenv import load_dotenv
from agents import OpenDsStarAgent

load_dotenv()

agent = OpenDsStarAgent(model="gpt-4o-mini")

result = agent.invoke("What is 15 * 23 + 42?")
print(result["answer"])

For detailed usage, see DS-Star Agent Documentation.

ReactAgent (LangChain ReAct Agent)

from dotenv import load_dotenv
from agents import ReactAgent

load_dotenv()

agent = ReactAgent()

result = agent.invoke("What is the capital of France?")
print(result["answer"])

See ReactAgent Documentation for more details.

Running Experiments

Quick Start - DataBench with 5 Questions

.venv/bin/python -m src.experiments.benchmarks.databench.databench_main \
  --question-limit 5 \
  --agent-type ds_star \
  --model-agent gpt-4o-mini

Other Benchmarks

# HotpotQA
.venv/bin/python -m src.experiments.benchmarks.hotpotqa.hotpotqa_main \
  --question-limit 20 --agent-type ds_star --model gpt-4o-mini

# KramaBench
.venv/bin/python -m src.experiments.benchmarks.kramabench.kramabench_main \
  --agent-type ds_star --model-agent gpt-4o-mini

See Installation Guide for detailed command options, model aliases, and parameter explanations.

See EXPERIMENTS.md for comprehensive experiment documentation.

Agent Implementations

OpenDsStar includes multiple agent implementations for comparison and benchmarking:

OpenDsStarAgent: Main DS-Star implementation — a Programmatic Tool Calling (PTC) agent with planning, coding, execution, debugging, and verification (Documentation)
ReactAgentLangchain: Lightweight wrapper around the LangChain ReAct agent (Documentation)
ReactAgentSmolagents: Smolagents-based ReAct implementation
CodeActAgentSmolagents: Smolagents-based CodeAct implementation

All agents share a common interface, making it easy to compare different agent paradigms on the same tasks.

Experiments Framework

OpenDsStar includes a comprehensive experiments framework for reproducible benchmarking and evaluation. It provides modular experiment design, automatic caching, multi-agent support, and built-in evaluation.

The framework includes:

Modular experiment design: Each experiment is self-contained, with its own data reader, tools builder, agent configuration, and evaluators
Easy extensibility: New experiments can be added by implementing a small set of simple interfaces, without modifying core framework code
Automatic caching: Intermediate results are cached to avoid redundant computation
Reproducibility: Experiment parameters are automatically saved, enabling exact reruns
Multiple agent support: The same experiment can be run with different agents (e.g., DS-Star, ReAct, CodeAct) for direct comparison
Built-in evaluation: Integrated evaluation metrics and result tracking

See EXPERIMENTS.md for more details.

Benchmark Results

Kramabench Dataset Evaluation

Comparison of DS-Star and CodeAct agents on the Kramabench dataset (31 questions) across multiple LLM providers:

Agent	Model	Total Tokens	LLM Calls	LLM Judge Score
CodeAct	Llama Maverick	3.3M	271	0.224
DS-Star	Llama Maverick	2.5M	548	0.248
CodeAct	Gemini 2.5 Flash	4.0M	275	0.297
DS-Star	Gemini 2.5 Flash	6.1M	664	0.303
CodeAct	Gemini 2.5 Pro	1.6M	235	0.312
DS-Star	Gemini 2.5 Pro	1.1M	701	0.387

Experimental setup

Both agents use identical tools and data access methods
Both use the same data-ingestion pipeline and file descriptions
File descriptions were generated using WatsonX Llama Maverick for all configurations
Observed performance differences are therefore attributable primarily to the agent architecture and reasoning strategy

Key findings

DS-Star consistently outperforms CodeAct across all tested models in answer quality
DS-Star achieves better results with fewer total tokens on Llama Maverick and Gemini 2.5 Pro
The planning, debugging, and verification cycle improves answer accuracy, even when it requires more LLM calls
Best overall result: DS-Star with Gemini 2.5 Pro (0.387 judge score, with the lowest token usage among the Gemini Pro runs)

Project Structure

src/
├── agents/
├── tools/
└── experiments/

Documentation

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.25

May 5, 2026

1.0.24

Apr 16, 2026

1.0.23

Apr 16, 2026

1.0.22

Apr 16, 2026

1.0.21

Apr 15, 2026

This version

1.0.20

Apr 15, 2026

1.0.19

Apr 15, 2026

1.0.18

Apr 15, 2026

1.0.17

Apr 14, 2026

1.0.16

Apr 13, 2026

1.0.15

Apr 9, 2026

1.0.14

Apr 9, 2026

1.0.13

Apr 9, 2026

1.0.12

Apr 9, 2026

1.0.11

Apr 9, 2026

1.0.10

Apr 5, 2026

1.0.9

Apr 5, 2026

1.0.8

Mar 31, 2026

1.0.7

Mar 31, 2026

1.0.6

Mar 30, 2026

1.0.5

Mar 30, 2026

1.0.3

Mar 30, 2026

1.0.2

Mar 30, 2026

1.0.1

Mar 29, 2026

1.0.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opendsstar-1.0.20.tar.gz (199.4 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opendsstar-1.0.20-py3-none-any.whl (260.1 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file opendsstar-1.0.20.tar.gz.

File metadata

Download URL: opendsstar-1.0.20.tar.gz
Upload date: Apr 15, 2026
Size: 199.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for opendsstar-1.0.20.tar.gz
Algorithm	Hash digest
SHA256	`0ed50af2eedb3865eba792618029b8360adafc78dd6cf931ecb543f9789c9fd4`
MD5	`9197aba8ca61edf6d7784eccbab31f51`
BLAKE2b-256	`1f3db22c7b0bb1a50f93fdc9fc2a11e7ce476ec1696a4e24bb17a75cae7948de`

See more details on using hashes here.

File details

Details for the file opendsstar-1.0.20-py3-none-any.whl.

File metadata

Download URL: opendsstar-1.0.20-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 260.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for opendsstar-1.0.20-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a8a91a875fb172d284091e8a4f2a0b1b171ad2a78267312d2e28f9f9b1f04e3`
MD5	`d3ef36b7b780dcee94feff378476ed4a`
BLAKE2b-256	`57a2c1c233d11bf3416a6b1c65b5f974e40ded750a68837de56691652c566f6c`

See more details on using hashes here.

OpenDsStar 1.0.20

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

OpenDsStar

Summary of Design Enhancements

Features

Execution Modes

Real-World Analysis Examples

Installation

From PyPI

From Source

Configuration

Quick Start

OpenDsStarAgent (DS-Star Implementation)

ReactAgent (LangChain ReAct Agent)

Running Experiments

Quick Start - DataBench with 5 Questions

Other Benchmarks

Agent Implementations

Experiments Framework

Benchmark Results

Kramabench Dataset Evaluation

Project Structure

Documentation

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes