AI Agent with dynamic planning and persistent Jupyter kernel execution for data analysis

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nmlemus

These details have not been verified by PyPI

Project description

DSAgent

An AI-powered autonomous agent for data analysis with dynamic planning and persistent Jupyter kernel execution.

Features

Dynamic Planning: Agent creates and follows plans with [x]/[ ] step tracking
Persistent Execution: Code runs in a Jupyter kernel with variable persistence
Multi-Provider LLM: Supports OpenAI, Anthropic, Google, Ollama via LiteLLM
Notebook Generation: Automatically generates clean, runnable Jupyter notebooks
Event Streaming: Real-time events for UI integration
Comprehensive Logging: Full execution logs for debugging and ML retraining
Human-in-the-Loop: Configurable checkpoints for human approval and feedback
MCP Tools Support: Connect to external tools via Model Context Protocol (web search, databases, etc.)

Installation

Using pip:

pip install datascience-agent

With FastAPI support:

pip install "datascience-agent[api]"

With MCP tools support:

pip install "datascience-agent[mcp]"

Using uv (recommended):

uv pip install datascience-agent
uv pip install "datascience-agent[api]"  # with FastAPI

For development:

git clone https://github.com/nmlemus/dsagent
cd dsagent
uv sync --all-extras

Configuration

API Keys

DSAgent requires an API key for your chosen LLM provider. Set it via environment variable or .env file:

Option 1: Environment variable

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic (Claude)
export ANTHROPIC_API_KEY="sk-ant-..."

# Google (Gemini)
export GOOGLE_API_KEY="..."

Option 2: .env file

Copy the example and fill in your values:

cp .env.example .env
# Edit .env with your API keys

The .env file is searched in this order:

Current working directory
Project root
~/.dsagent/.env

Priority order: CLI arguments > Environment variables > .env file > defaults

See .env.example for all available configuration options.

Quick Start

Basic Usage

from dsagent import PlannerAgent

# Basic usage - task only
with PlannerAgent(model="gpt-4o") as agent:
    result = agent.run("Write a function to calculate fibonacci numbers")
    print(result.answer)

# With data file - automatically copied to workspace/data/
with PlannerAgent(model="gpt-4o", data="./sales_data.csv") as agent:
    result = agent.run("Analyze this dataset and identify top performing products")
    print(result.answer)
    print(f"Notebook: {result.notebook_path}")

With Streaming

from dsagent import PlannerAgent, EventType

agent = PlannerAgent(model="claude-3-sonnet-20240229")
agent.start()

for event in agent.run_stream("Build a predictive model for customer churn"):
    if event.type == EventType.PLAN_UPDATED:
        print(f"Plan: {event.plan.raw_text if event.plan else ''}")
    elif event.type == EventType.CODE_SUCCESS:
        print("Code executed successfully")
    elif event.type == EventType.CODE_FAILED:
        print("Code execution failed")
    elif event.type == EventType.ANSWER_ACCEPTED:
        print(f"Answer: {event.message}")

# Get result with notebook after streaming
result = agent.get_result()
print(f"Notebook: {result.notebook_path}")

agent.shutdown()

FastAPI Integration

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from uuid import uuid4
from dsagent import PlannerAgent, EventType

app = FastAPI()

@app.post("/analyze")
async def analyze(task: str):
    async def event_stream():
        agent = PlannerAgent(
            model="gpt-4o",
            session_id=str(uuid4()),
        )
        agent.start()

        try:
            for event in agent.run_stream(task):
                yield f"data: {event.to_sse()}\n\n"
        finally:
            agent.shutdown()

    return StreamingResponse(event_stream(), media_type="text/event-stream")

Command Line Interface

The package includes a CLI for quick analysis from the terminal:

# With data file
dsagent "Analyze this dataset and create visualizations" --data ./my_data.csv

# Without data (code generation, research, etc.)
dsagent "Write a Python script to scrape weather data" --model claude-3-5-sonnet-20241022

CLI Options

Option	Short	Description
`--data`	`-d`	Path to data file or directory (optional)
`--model`	`-m`	LLM model to use (default: gpt-4o)
`--workspace`	`-w`	Output directory (default: ./workspace)
`--run-id`		Custom run ID for this execution
`--max-rounds`	`-r`	Max iterations (default: 30)
`--quiet`	`-q`	Suppress verbose output
`--no-stream`		Disable streaming output
`--hitl`		HITL mode: none, plan_only, on_error, plan_and_answer, full
`--mcp-config`		Path to MCP servers YAML configuration file

CLI Examples

# Basic analysis with data
dsagent "Find trends and patterns" -d ./sales.csv

# Code generation (no data needed)
dsagent "Write a REST API client for GitHub" --model gpt-4o

# With specific model
dsagent "Build ML model" -d ./dataset -m claude-3-sonnet-20240229

# Custom output directory
dsagent "Create charts" -d ./data -w ./output

# With MCP tools (no data)
dsagent "Search for Python best practices and summarize" --mcp-config ~/.dsagent/mcp.yaml

# Quiet mode
dsagent "Analyze" -d ./data -q

Output Structure

Each run creates an isolated workspace:

workspace/
└── runs/
    └── {run_id}/
        ├── data/          # Input data (copied)
        ├── notebooks/     # Generated notebooks
        ├── artifacts/     # Images, charts, outputs
        └── logs/
            ├── run.log        # Human-readable log
            └── events.jsonl   # Structured events for ML

Agent Configuration

from dsagent import PlannerAgent, RunContext

# Simple usage
agent = PlannerAgent(
    model="gpt-4o",           # Any LiteLLM-supported model
    data="./my_data.csv",     # Optional: data file or directory
    workspace="./workspace",  # Working directory
    max_rounds=30,            # Max agent iterations
    max_tokens=4096,          # Max tokens per response
    temperature=0.2,          # LLM temperature
    timeout=300,              # Code execution timeout (seconds)
    verbose=True,             # Print to console
    event_callback=None,      # Callback for events
)

# With run isolation (for multi-user scenarios)
context = RunContext(workspace="./workspace")
context.copy_data("./dataset")  # Copy data to run's data folder
agent = PlannerAgent(model="gpt-4o", context=context)

Workspace Structure

When running, DSAgent creates this structure:

workspace/
├── data/          # Input data (read from here)
├── artifacts/     # Outputs: images, models, CSVs, reports
├── notebooks/     # Generated Jupyter notebooks
└── logs/          # Execution logs

With RunContext, each run gets isolated storage under workspace/runs/{run_id}/.

Human-in-the-Loop (HITL)

Control agent autonomy with configurable HITL modes:

from dsagent import PlannerAgent, HITLMode, EventType

# Create agent with HITL enabled
agent = PlannerAgent(
    model="gpt-4o",
    hitl=HITLMode.PLAN_ONLY,  # Pause for plan approval
)
agent.start()

# Run with streaming to handle HITL events
for event in agent.run_stream("Analyze sales data"):
    if event.type == EventType.HITL_AWAITING_PLAN_APPROVAL:
        print(f"Plan proposed:\n{event.plan.raw_text}")
        # Approve the plan
        agent.approve()
        # Or reject: agent.reject("Bad plan")
        # Or modify: agent.modify_plan("1. [ ] Better step")

    elif event.type == EventType.ANSWER_ACCEPTED:
        print(f"Answer: {event.message}")

agent.shutdown()

HITL Modes

Mode	Description
`HITLMode.NONE`	Fully autonomous (default)
`HITLMode.PLAN_ONLY`	Pause after plan generation for approval
`HITLMode.ON_ERROR`	Pause when code execution fails
`HITLMode.PLAN_AND_ANSWER`	Pause on plan + before final answer
`HITLMode.FULL`	Pause before every code execution

HITL Actions

# Approve current pending item
agent.approve("Looks good!")

# Reject and abort
agent.reject("This approach won't work")

# Modify the plan
agent.modify_plan("1. [ ] New step\n2. [ ] Another step")

# Modify code before execution (FULL mode)
agent.modify_code("import pandas as pd\ndf = pd.read_csv('data.csv')")

# Skip current step
agent.skip()

# Send feedback to guide the agent
agent.send_feedback("Try using a different algorithm")

HITL Events

EventType.HITL_AWAITING_PLAN_APPROVAL    # Waiting for plan approval
EventType.HITL_AWAITING_CODE_APPROVAL    # Waiting for code approval (FULL mode)
EventType.HITL_AWAITING_ERROR_GUIDANCE   # Waiting for error guidance
EventType.HITL_AWAITING_ANSWER_APPROVAL  # Waiting for answer approval
EventType.HITL_FEEDBACK_RECEIVED         # Human feedback was received
EventType.HITL_PLAN_APPROVED             # Plan was approved
EventType.HITL_PLAN_MODIFIED             # Plan was modified
EventType.HITL_PLAN_REJECTED             # Plan was rejected
EventType.HITL_EXECUTION_ABORTED         # Execution was aborted

MCP Tools Support

DSAgent supports the Model Context Protocol (MCP) to connect to external tool servers, enabling capabilities like web search, database queries, and more.

Installation

pip install "datascience-agent[mcp]"

Configuration

Create a YAML configuration file (e.g., ~/.dsagent/mcp.yaml):

servers:
  # Brave Search - web search capability
  - name: brave_search
    transport: stdio
    command: ["npx", "-y", "@modelcontextprotocol/server-brave-search"]
    env:
      BRAVE_API_KEY: "${BRAVE_API_KEY}"

  # Filesystem access
  - name: filesystem
    transport: stdio
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"]

  # HTTP-based MCP server
  - name: custom_server
    transport: http
    url: "http://localhost:8080/mcp"
    enabled: false  # Disable without removing

Usage

Python API

from dsagent import PlannerAgent

agent = PlannerAgent(
    model="gpt-4o",
    mcp_config="~/.dsagent/mcp.yaml",  # Path to config
)
agent.start()

# Agent can now use web search and other MCP tools
for event in agent.run_stream("Search for latest AI trends and analyze them"):
    if event.type == EventType.ANSWER_ACCEPTED:
        print(event.message)

agent.shutdown()

CLI

# Set API keys
export BRAVE_API_KEY="your-brave-api-key"

# Run with MCP tools (no data needed for web search)
dsagent "Search for Python best practices and summarize" \
  --mcp-config ~/.dsagent/mcp.yaml

# With data
dsagent "Search for similar datasets online and compare with mine" \
  --data ./my_data.csv \
  --mcp-config ~/.dsagent/mcp.yaml

Environment Variables

Use ${VAR_NAME} syntax in YAML to reference environment variables:

env:
  API_KEY: "${MY_API_KEY}"      # Resolved from environment
  STATIC_VALUE: "hardcoded"     # Static value

Available MCP Servers

Some popular MCP servers you can use:

Server	Package	Description
Brave Search	`@modelcontextprotocol/server-brave-search`	Web search via Brave API
Filesystem	`@modelcontextprotocol/server-filesystem`	File system access
PostgreSQL	`@modelcontextprotocol/server-postgres`	PostgreSQL database queries
Puppeteer	`@modelcontextprotocol/server-puppeteer`	Browser automation

See MCP Servers Directory for more options.

Supported Models

Any model supported by LiteLLM:

OpenAI: gpt-4o, gpt-4-turbo, gpt-3.5-turbo
Anthropic: claude-3-opus-20240229, claude-3-sonnet-20240229
Google: gemini-pro, gemini-1.5-pro
Ollama: ollama/llama3, ollama/codellama
And many more...

Event Types

from dsagent import EventType

EventType.AGENT_STARTED       # Agent started processing
EventType.AGENT_FINISHED      # Agent finished
EventType.AGENT_ERROR         # Error occurred
EventType.ROUND_STARTED       # New iteration round
EventType.ROUND_FINISHED      # Round completed
EventType.LLM_CALL_STARTED    # LLM call started
EventType.LLM_CALL_FINISHED   # LLM response received
EventType.PLAN_CREATED        # Plan was created
EventType.PLAN_UPDATED        # Plan was updated
EventType.CODE_EXECUTING      # Code execution started
EventType.CODE_SUCCESS        # Code execution succeeded
EventType.CODE_FAILED         # Code execution failed
EventType.ANSWER_ACCEPTED     # Final answer generated
EventType.ANSWER_REJECTED     # Answer rejected (plan incomplete)

Architecture

dsagent/
├── agents/
│   └── base.py          # PlannerAgent - main user interface
├── core/
│   ├── context.py       # RunContext - workspace management
│   ├── engine.py        # AgentEngine - main loop
│   ├── executor.py      # JupyterExecutor - code execution
│   ├── hitl.py          # HITLGateway - human-in-the-loop
│   └── planner.py       # PlanParser - response parsing
├── tools/
│   ├── config.py        # MCP configuration models
│   └── mcp_manager.py   # MCPManager - MCP server connections
├── schema/
│   └── models.py        # Pydantic models
└── utils/
    ├── logger.py        # AgentLogger - console logging
    ├── run_logger.py    # RunLogger - comprehensive logging
    └── notebook.py      # NotebookBuilder - notebook generation

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nmlemus

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.1

Feb 19, 2026

0.9.0

Feb 19, 2026

0.8.4

Feb 5, 2026

0.8.3

Jan 29, 2026

0.8.2

Jan 27, 2026

0.8.1

Jan 21, 2026

0.8.0

Jan 20, 2026

0.7.0

Jan 11, 2026

0.6.2

Jan 11, 2026

0.6.1

Jan 9, 2026

This version

0.5.1

Jan 2, 2026

0.5.0

Dec 31, 2025

0.4.0

Dec 31, 2025

0.3.0

Dec 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascience_agent-0.5.1.tar.gz (367.2 kB view details)

Uploaded Jan 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datascience_agent-0.5.1-py3-none-any.whl (55.8 kB view details)

Uploaded Jan 2, 2026 Python 3

File details

Details for the file datascience_agent-0.5.1.tar.gz.

File metadata

Download URL: datascience_agent-0.5.1.tar.gz
Upload date: Jan 2, 2026
Size: 367.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datascience_agent-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`ee40054a5ba4abf013955f1c5af6bd0e15bc5a901f2ea88cf881c0b17e626e05`
MD5	`3ab4a77fcbdc4df4a488e111eeab9be6`
BLAKE2b-256	`f52c0159f4e550f914307c725be2535724cf38bf5d913b6531256a4135ac90a6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datascience_agent-0.5.1.tar.gz:

Publisher: python-publish.yml on nmlemus/dsagent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datascience_agent-0.5.1.tar.gz
- Subject digest: ee40054a5ba4abf013955f1c5af6bd0e15bc5a901f2ea88cf881c0b17e626e05
- Sigstore transparency entry: 787782114
- Sigstore integration time: Jan 2, 2026
Source repository:
- Permalink: nmlemus/dsagent@4c20da1a4dbb1c77856c0920f8dda9acd546d1a5
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/nmlemus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@4c20da1a4dbb1c77856c0920f8dda9acd546d1a5
- Trigger Event: release

File details

Details for the file datascience_agent-0.5.1-py3-none-any.whl.

File metadata

Download URL: datascience_agent-0.5.1-py3-none-any.whl
Upload date: Jan 2, 2026
Size: 55.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datascience_agent-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`951f30dba8f53fa2d6d61e88da6423b45ea3d03700bfe3b8641ddaf5d11d4f42`
MD5	`29b25d9b34b67f3bddba14a400d26dc9`
BLAKE2b-256	`3e363320b699bdd57a9050376d52ce8672de940db3111f9253825a43282d4929`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datascience_agent-0.5.1-py3-none-any.whl:

Publisher: python-publish.yml on nmlemus/dsagent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datascience_agent-0.5.1-py3-none-any.whl
- Subject digest: 951f30dba8f53fa2d6d61e88da6423b45ea3d03700bfe3b8641ddaf5d11d4f42
- Sigstore transparency entry: 787782115
- Sigstore integration time: Jan 2, 2026
Source repository:
- Permalink: nmlemus/dsagent@4c20da1a4dbb1c77856c0920f8dda9acd546d1a5
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/nmlemus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@4c20da1a4dbb1c77856c0920f8dda9acd546d1a5
- Trigger Event: release

datascience-agent 0.5.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

DSAgent

Features

Installation

Configuration

API Keys

Quick Start

Basic Usage

With Streaming

FastAPI Integration

Command Line Interface

CLI Options

CLI Examples

Output Structure

Agent Configuration

Workspace Structure

Human-in-the-Loop (HITL)

HITL Modes

HITL Actions

HITL Events

MCP Tools Support

Installation

Configuration

Usage

Python API

CLI

Environment Variables

Available MCP Servers

Supported Models

Event Types

Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance