AI-powered data science co-pilot using Claude — explore data, design projects, generate code, and brainstorm from anywhere Python runs.

These details have not been verified by PyPI

Project links

Project description

DataSpark — AI-Powered Data Science Co-Pilot

A Python library that brings Claude's data science expertise into your local workflow — Jupyter notebooks, scripts, terminal, anywhere Python runs.

No browser needed. No logging in. Just import and go.

Quick Start

1. Install

pip install dataspark-ai              # from PyPI
# or
pip install dataspark-ai[full]        # includes sklearn, matplotlib, seaborn, plotly, scipy
# or from source
git clone https://github.com/KTG0409/dataspark.git
cd dataspark && pip install -e ".[dev]"

2. Set your API key

export ANTHROPIC_API_KEY="sk-ant-api03-..."

Get a key at console.anthropic.com/settings/keys

3. Use It

from dataspark import Spark

spark = Spark()

# Explore a dataset — get instant analysis, quality checks, and recommendations
spark.explore("sales_data.csv")

# Ask any data science question
spark.ask("Should I use one-hot encoding or target encoding for high-cardinality categoricals?")

# Design a complete project
spark.project("Build a customer churn prediction model for our SaaS platform")

# Brainstorm creative analysis ideas
spark.brainstorm("I have 3 years of e-commerce transaction data with 2M rows")

# Generate production code
spark.code("Build a feature engineering pipeline for time series with lag features and rolling stats")

# Get best practices guidance
spark.best_practices("Cross-validation strategies for time series data")

# Interactive session (like chatting with Claude)
spark.chat()

Core Features

`spark.explore(source)` — Dataset Analysis

Pass a CSV, Excel file, DataFrame, or URL. DataSpark will:

Profile every column (types, distributions, outliers, correlations)
Flag data quality issues
Recommend specific analyses based on what it sees
Ask you clarifying questions about your goals
Provide ready-to-run code snippets

spark.explore("customers.csv")
spark.explore("https://data.example.com/dataset.csv")
spark.explore(my_dataframe, name="revenue")

# Focus on something specific
spark.explore("data.csv", focus="I need to predict the 'churned' column")

`spark.project(description)` — Project Design

Describe what you want to build. DataSpark designs the full pipeline:

spark.project("Forecast demand for 500 SKUs across 12 warehouses, daily granularity")
spark.project("Build a recommendation engine for our content platform")
spark.project("Anomaly detection for network traffic logs, ~10M events/day")

`spark.brainstorm(context)` — Idea Generation

Get creative, ranked ideas from quick wins to big bets:

spark.brainstorm("We have clickstream data, purchase history, and customer support tickets")
spark.brainstorm("Our marketing team wants to understand campaign attribution")

`spark.code(request)` — Code Generation

Get complete, production-quality Python code:

spark.code("XGBoost pipeline with Optuna hyperparameter tuning")
spark.code("Automated EDA function that generates a PDF report")
spark.code("FastAPI endpoint that serves predictions from a pickled model")

`spark.ask(question)` — Ask Anything

Maintains conversation history so you can have a back-and-forth:

spark.ask("What's the best way to handle class imbalance?")
spark.ask("Show me how to implement SMOTE with that approach")
spark.ask("Now how do I evaluate it properly?")

`spark.chat()` — Interactive Terminal Session

Full interactive mode with slash commands:

/explore data.csv    — Load and analyze a dataset
/project <desc>      — Design a project
/brainstorm <ctx>    — Generate ideas
/code <request>      — Generate code
/model sonnet        — Switch models
/save conversation.md — Save chat history
/clear               — Reset context
/help                — Show commands
/quit                — Exit

Configuration

# Model selection (default: Claude Sonnet)
spark = Spark(model="opus")      # Most capable
spark = Spark(model="sonnet")    # Balanced (default)
spark = Spark(model="haiku")     # Fastest / cheapest

# Longer responses
spark = Spark(max_tokens=8192)

# Debug mode
spark = Spark(verbose=True)

Command-Line Usage

# Interactive chat
dataspark

# Explore a dataset
dataspark explore data.csv
dataspark explore data.csv -f "focus on the target variable"

# Quick question
dataspark ask "When should I use Ridge vs Lasso?"

# Project design
dataspark project "Build a fraud detection system"

# Use a specific model
dataspark -m opus explore big_dataset.parquet

Jupyter Notebook Tips

from dataspark import Spark
spark = Spark()

# Load data through spark — it profiles automatically
df = spark.load("data.csv")

# Now all questions are context-aware
spark.ask("What feature engineering should I do?")
spark.ask("Write the code for that")

# You can also explore at any point
spark.explore(focus="relationships between price and demand")

Architecture

dataspark/
├── __init__.py      # Clean exports
├── core.py          # Spark class — main interface & API calls
├── explorer.py      # DataExplorer — load & profile datasets
├── profiles.py      # DataProfile — statistical profiling
├── prompts.py       # System prompts for each mode
└── cli.py           # Command-line interface

The library works by:

Profiling your data locally (pandas — nothing leaves your machine except the summary)
Building rich context from the profile (statistics, distributions, quality issues)
Sending that context + your question to Claude via the API
Maintaining conversation history so follow-ups are contextual

Your raw data never leaves your machine. Only statistical summaries and column metadata are sent to the API.

Extending DataSpark

Custom System Prompts

from dataspark import Spark

spark = Spark()
spark._current_system = """You are a financial data science expert.
Focus on: regulatory compliance, risk modeling, backtesting.
Always consider: data leakage, survivorship bias, look-ahead bias."""

spark.ask("How should I backtest this trading strategy?")

Adding Data Context Manually

spark._data_context = """
We have a PostgreSQL database with:
- transactions (50M rows, 3 years)
- customers (2M rows)
- products (10K SKUs)
Business: B2B SaaS, $50M ARR, 15% annual churn
"""
spark.ask("What analyses would have the most business impact?")

Cost Awareness

API costs per ~1000 tokens (approximate):

Model	Input	Output
Haiku	$0.001	$0.005
Sonnet	$0.003	$0.015
Opus	$0.015	$0.075

A typical explore() call uses ~2-4K tokens. An interactive session might use 10-50K tokens total. Use spark = Spark(model="haiku") for cost-sensitive workloads.

Privacy & Security

Your raw data stays local. Only statistical summaries (means, distributions, column names, 3 sample rows) are sent to the API.
API key is yours. Use a personal key — it's billed to your Anthropic account, not tied to any employer.
No logging by default. Conversations are in-memory only unless you /save them.
Review what's sent. Call spark.explorer.context_for_llm() to see exactly what goes to the API.

License

MIT — use it however you want.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Feb 14, 2026

0.1.0

Feb 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataspark_ai-0.1.1.tar.gz (21.0 kB view details)

Uploaded Feb 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataspark_ai-0.1.1-py3-none-any.whl (18.6 kB view details)

Uploaded Feb 14, 2026 Python 3

File details

Details for the file dataspark_ai-0.1.1.tar.gz.

File metadata

Download URL: dataspark_ai-0.1.1.tar.gz
Upload date: Feb 14, 2026
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dataspark_ai-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c6d99e5e1324332c6c949a11f7f18f23c7be0d2fbfb4e6291a2a8cd64b63c66e`
MD5	`13548fd3a2879f354511aaa3b1db9f26`
BLAKE2b-256	`42a0e4283449798febd4fbd3b02298c798b35e569c8ac218ebf228bf48c55bd3`

See more details on using hashes here.

File details

Details for the file dataspark_ai-0.1.1-py3-none-any.whl.

File metadata

Download URL: dataspark_ai-0.1.1-py3-none-any.whl
Upload date: Feb 14, 2026
Size: 18.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dataspark_ai-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0fbf72dc5aa4af36376e5da9db000727620cbe6d14d580a4f0a4045e84d82d14`
MD5	`b9bd60e114ce9e3d44eafb800236b1f7`
BLAKE2b-256	`17027995c06c8c4a6b09adb3b9223154d7186d23d5711e5162d1755761468ecd`

See more details on using hashes here.

dataspark-ai 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataSpark — AI-Powered Data Science Co-Pilot

Quick Start

1. Install

2. Set your API key

3. Use It

Core Features

spark.explore(source) — Dataset Analysis

spark.project(description) — Project Design

spark.brainstorm(context) — Idea Generation

spark.code(request) — Code Generation

spark.ask(question) — Ask Anything

spark.chat() — Interactive Terminal Session

Configuration

Command-Line Usage

Jupyter Notebook Tips

Architecture

Extending DataSpark

Custom System Prompts

Adding Data Context Manually

Cost Awareness

Privacy & Security

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`spark.explore(source)` — Dataset Analysis

`spark.project(description)` — Project Design

`spark.brainstorm(context)` — Idea Generation

`spark.code(request)` — Code Generation

`spark.ask(question)` — Ask Anything

`spark.chat()` — Interactive Terminal Session