AI Agent with dynamic planning and persistent Jupyter kernel execution for data analysis
Project description
DSAgent
An AI-powered autonomous agent for data analysis with dynamic planning and persistent Jupyter kernel execution.
Features
- Dynamic Planning: Agent creates and follows plans with [x]/[ ] step tracking
- Persistent Execution: Code runs in a Jupyter kernel with variable persistence
- Multi-Provider LLM: Supports OpenAI, Anthropic, Google, Ollama via LiteLLM
- Notebook Generation: Automatically generates clean, runnable Jupyter notebooks
- Event Streaming: Real-time events for UI integration
- Comprehensive Logging: Full execution logs for debugging and ML retraining
- Session Management: State persistence for multi-user scenarios
- Human-in-the-Loop: Configurable checkpoints for human approval and feedback
Installation
Using pip:
pip install datascience-agent
With FastAPI support:
pip install "datascience-agent[api]"
Using uv (recommended):
uv pip install datascience-agent
uv pip install "datascience-agent[api]" # with FastAPI
For development:
git clone https://github.com/nmlemus/dsagent
cd dsagent
uv sync --all-extras
Quick Start
Basic Usage
from dsagent import PlannerAgent
# Create agent
with PlannerAgent(model="gpt-4o", workspace="./workspace") as agent:
result = agent.run("Analyze sales_data.csv and identify top performing products")
print(result.answer)
print(f"Notebook: {result.notebook_path}")
With Streaming
from dsagent import PlannerAgent, EventType
agent = PlannerAgent(model="claude-3-sonnet-20240229")
agent.start()
for event in agent.run_stream("Build a predictive model for customer churn"):
if event.type == EventType.PLAN_UPDATED:
print(f"Plan: {event.plan.raw_text if event.plan else ''}")
elif event.type == EventType.CODE_SUCCESS:
print("Code executed successfully")
elif event.type == EventType.CODE_FAILED:
print("Code execution failed")
elif event.type == EventType.ANSWER_ACCEPTED:
print(f"Answer: {event.message}")
# Get result with notebook after streaming
result = agent.get_result()
print(f"Notebook: {result.notebook_path}")
agent.shutdown()
FastAPI Integration
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from uuid import uuid4
from dsagent import PlannerAgent, EventType
app = FastAPI()
@app.post("/analyze")
async def analyze(task: str):
async def event_stream():
agent = PlannerAgent(
model="gpt-4o",
session_id=str(uuid4()),
)
agent.start()
try:
for event in agent.run_stream(task):
yield f"data: {event.to_sse()}\n\n"
finally:
agent.shutdown()
return StreamingResponse(event_stream(), media_type="text/event-stream")
Command Line Interface
The package includes a CLI for quick analysis from the terminal:
dsagent "Analyze this dataset and create visualizations" --data ./my_data.csv
CLI Options
| Option | Short | Description |
|---|---|---|
--data |
-d |
Path to data file or directory (required) |
--model |
-m |
LLM model to use (default: gpt-4o) |
--workspace |
-w |
Output directory (default: ./workspace) |
--run-id |
Custom run ID for this execution | |
--max-rounds |
-r |
Max iterations (default: 30) |
--quiet |
-q |
Suppress verbose output |
--no-stream |
Disable streaming output |
CLI Examples
# Basic analysis
dsagent "Find trends and patterns" -d ./sales.csv
# With specific model
dsagent "Build ML model" -d ./dataset -m claude-3-sonnet-20240229
# Custom output directory
dsagent "Create charts" -d ./data -w ./output
# With custom run ID
dsagent "Analyze" -d ./data --run-id my-analysis-001
# Quiet mode
dsagent "Analyze" -d ./data -q
Output Structure
Each run creates an isolated workspace:
workspace/
└── runs/
└── {run_id}/
├── data/ # Input data (copied)
├── notebooks/ # Generated notebooks
├── artifacts/ # Images, charts, outputs
└── logs/
├── run.log # Human-readable log
└── events.jsonl # Structured events for ML
Configuration
from dsagent import PlannerAgent, RunContext
# With automatic run isolation
context = RunContext(workspace="./workspace")
agent = PlannerAgent(
model="gpt-4o", # Any LiteLLM-supported model
context=context, # Run context for isolation
max_rounds=30, # Max agent iterations
max_tokens=4096, # Max tokens per response
temperature=0.2, # LLM temperature
timeout=300, # Code execution timeout (seconds)
verbose=True, # Print to console
event_callback=None, # Callback for events
)
Human-in-the-Loop (HITL)
Control agent autonomy with configurable HITL modes:
from dsagent import PlannerAgent, HITLMode, EventType
# Create agent with HITL enabled
agent = PlannerAgent(
model="gpt-4o",
hitl=HITLMode.PLAN_ONLY, # Pause for plan approval
)
agent.start()
# Run with streaming to handle HITL events
for event in agent.run_stream("Analyze sales data"):
if event.type == EventType.HITL_AWAITING_PLAN_APPROVAL:
print(f"Plan proposed:\n{event.plan.raw_text}")
# Approve the plan
agent.approve()
# Or reject: agent.reject("Bad plan")
# Or modify: agent.modify_plan("1. [ ] Better step")
elif event.type == EventType.ANSWER_ACCEPTED:
print(f"Answer: {event.message}")
agent.shutdown()
HITL Modes
| Mode | Description |
|---|---|
HITLMode.NONE |
Fully autonomous (default) |
HITLMode.PLAN_ONLY |
Pause after plan generation for approval |
HITLMode.ON_ERROR |
Pause when code execution fails |
HITLMode.PLAN_AND_ANSWER |
Pause on plan + before final answer |
HITLMode.FULL |
Pause before every code execution |
HITL Actions
# Approve current pending item
agent.approve("Looks good!")
# Reject and abort
agent.reject("This approach won't work")
# Modify the plan
agent.modify_plan("1. [ ] New step\n2. [ ] Another step")
# Modify code before execution (FULL mode)
agent.modify_code("import pandas as pd\ndf = pd.read_csv('data.csv')")
# Skip current step
agent.skip()
# Send feedback to guide the agent
agent.send_feedback("Try using a different algorithm")
HITL Events
EventType.HITL_AWAITING_PLAN_APPROVAL # Waiting for plan approval
EventType.HITL_AWAITING_CODE_APPROVAL # Waiting for code approval (FULL mode)
EventType.HITL_AWAITING_ERROR_GUIDANCE # Waiting for error guidance
EventType.HITL_AWAITING_ANSWER_APPROVAL # Waiting for answer approval
EventType.HITL_FEEDBACK_RECEIVED # Human feedback was received
EventType.HITL_PLAN_APPROVED # Plan was approved
EventType.HITL_PLAN_MODIFIED # Plan was modified
EventType.HITL_PLAN_REJECTED # Plan was rejected
EventType.HITL_EXECUTION_ABORTED # Execution was aborted
Supported Models
Any model supported by LiteLLM:
- OpenAI:
gpt-4o,gpt-4-turbo,gpt-3.5-turbo - Anthropic:
claude-3-opus-20240229,claude-3-sonnet-20240229 - Google:
gemini-pro,gemini-1.5-pro - Ollama:
ollama/llama3,ollama/codellama - And many more...
Event Types
from dsagent import EventType
EventType.AGENT_STARTED # Agent started processing
EventType.AGENT_FINISHED # Agent finished
EventType.AGENT_ERROR # Error occurred
EventType.ROUND_STARTED # New iteration round
EventType.ROUND_FINISHED # Round completed
EventType.LLM_CALL_STARTED # LLM call started
EventType.LLM_CALL_FINISHED # LLM response received
EventType.PLAN_CREATED # Plan was created
EventType.PLAN_UPDATED # Plan was updated
EventType.CODE_EXECUTING # Code execution started
EventType.CODE_SUCCESS # Code execution succeeded
EventType.CODE_FAILED # Code execution failed
EventType.ANSWER_ACCEPTED # Final answer generated
EventType.ANSWER_REJECTED # Answer rejected (plan incomplete)
Architecture
dsagent/
├── agents/
│ └── base.py # PlannerAgent - main user interface
├── core/
│ ├── context.py # RunContext - workspace management
│ ├── engine.py # AgentEngine - main loop
│ ├── executor.py # JupyterExecutor - code execution
│ ├── hitl.py # HITLGateway - human-in-the-loop
│ └── planner.py # PlanParser - response parsing
├── schema/
│ └── models.py # Pydantic models
└── utils/
├── logger.py # AgentLogger - console logging
├── run_logger.py # RunLogger - comprehensive logging
└── notebook.py # NotebookBuilder - notebook generation
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datascience_agent-0.4.0.tar.gz.
File metadata
- Download URL: datascience_agent-0.4.0.tar.gz
- Upload date:
- Size: 342.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06ca17b7310e669d9ce889930a36ed86771a3a58585ce6dd4494622017b41c8f
|
|
| MD5 |
23799dfd59fee1f8944ef6d7d534290c
|
|
| BLAKE2b-256 |
f1ec2e943d7ecb473d355e99b1ed50c85a9841563aba0e78fdeb2f9d80e46a82
|
Provenance
The following attestation bundles were made for datascience_agent-0.4.0.tar.gz:
Publisher:
python-publish.yml on nmlemus/dsagent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datascience_agent-0.4.0.tar.gz -
Subject digest:
06ca17b7310e669d9ce889930a36ed86771a3a58585ce6dd4494622017b41c8f - Sigstore transparency entry: 785819840
- Sigstore integration time:
-
Permalink:
nmlemus/dsagent@f5e41cbf6c4985875d83659b38a5976012959edf -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/nmlemus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@f5e41cbf6c4985875d83659b38a5976012959edf -
Trigger Event:
release
-
Statement type:
File details
Details for the file datascience_agent-0.4.0-py3-none-any.whl.
File metadata
- Download URL: datascience_agent-0.4.0-py3-none-any.whl
- Upload date:
- Size: 41.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a0303806d0fdcae65c80372859e55c025bf3a82f741545b2d49ceb9b10cbdd0
|
|
| MD5 |
5c19933e29437ecf20f87a2437889704
|
|
| BLAKE2b-256 |
c272149680da69ac6385d3c231d078f0b22ac1806e7820c0275dcde6d8384a89
|
Provenance
The following attestation bundles were made for datascience_agent-0.4.0-py3-none-any.whl:
Publisher:
python-publish.yml on nmlemus/dsagent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datascience_agent-0.4.0-py3-none-any.whl -
Subject digest:
4a0303806d0fdcae65c80372859e55c025bf3a82f741545b2d49ceb9b10cbdd0 - Sigstore transparency entry: 785819855
- Sigstore integration time:
-
Permalink:
nmlemus/dsagent@f5e41cbf6c4985875d83659b38a5976012959edf -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/nmlemus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@f5e41cbf6c4985875d83659b38a5976012959edf -
Trigger Event:
release
-
Statement type: