A Python analytics workbench for teaching data science

These details have not been verified by PyPI

Project links

Project description

PyAnalytica

A Python analytics workbench for teaching data science

Interactive data exploration, visualization, statistical analysis, and machine learning — with a "Show Code" button that reveals the pandas & sklearn code behind every operation.

Feature Highlights

Category	Capabilities
Data	Load CSV/Excel/bundled datasets, profile columns, view/filter, transform (rename, retype, compute, filter, fill missing, sample), combine (merge/concat), export
Explore	Group-by summarize with percent-of-total, pivot tables, cross-tabulation with chi-squared
Visualize	Histograms, density, box/violin, scatter, line, bar, heatmap correlation, timeline
Analyze	Independent & paired t-tests, one-way ANOVA, proportion z-tests, chi-squared, Pearson/Spearman correlation
Model	Linear & logistic regression, k-NN/SVM/tree/random-forest classification, k-means/hierarchical clustering, PCA, model evaluation, saved-model prediction
Homework	YAML-based assignments with hash-checked answers, automatic grading, submission export
Report	Export analyses as HTML reports, Python scripts, or Jupyter notebooks
AI	Rule-based + optional LLM interpretation, next-step suggestions, challenge questions, natural-language data queries
Workflow	Procedure builder to record, replay, annotate, and export multi-step analysis pipelines

Screenshots

Screenshots coming soon. The app features a modern gradient + glassmorphism UI with:

Indigo-to-purple gradient navbar

Glassmorphism panels with frosted-glass effect

Clean data grids with gradient headers

Dark-themed "Show Code" panels

Polished form controls with accent focus rings

Quick Start

Launch the interactive workbench

pyanalytica                # CLI entry point (after pip install)
python -m pyanalytica      # or run as a module

Use as a Python library

Every analytics function returns a (result, CodeSnippet) tuple. The CodeSnippet contains the equivalent pandas/sklearn code so students can see what runs under the hood.

from pyanalytica.data.load import load_bundled
from pyanalytica.data.profile import profile_dataframe
from pyanalytica.visualize.distribute import histogram
from pyanalytica.visualize.relate import scatter
from pyanalytica.explore.summarize import group_summarize

# Load a bundled dataset
df, code = load_bundled("tips")

# Profile the dataframe — column types, missing values, summary stats
profile = profile_dataframe(df)

# Visualize
fig, code = histogram(df, "total_bill", bins=20)
fig, code = scatter(df, x="total_bill", y="tip", color_by="smoker")

# Summarize — group_cols, value_cols, agg_funcs are all lists
result, code = group_summarize(
    df,
    group_cols=["day"],
    value_cols=["tip"],
    agg_funcs=["mean"],
)

The CodeSnippet Pattern

Every analytics function in PyAnalytica returns a tuple of (result, CodeSnippet). The CodeSnippet dataclass holds the equivalent pandas/sklearn code so students can learn what happens behind the UI:

from pyanalytica.core.codegen import CodeSnippet

# CodeSnippet(code="df.groupby(['day'])['tip'].mean()", imports=["import pandas as pd"])

# In the Shiny UI, the "Show Code" button renders this as a copyable code block.
# The emitted code uses real pandas/sklearn calls — never wrapper functions.

Installation

# Core package (Shiny UI + all analytics)
pip install pyanalytica

# With AI integration (Anthropic Claude)
pip install "pyanalytica[ai]"

# With Jupyter notebook export
pip install "pyanalytica[report]"

# Everything (recommended)
pip install "pyanalytica[all]"

To update to the latest version:

pip install --upgrade pyanalytica

Switching from a GitHub install? Run pip uninstall pyanalytica first, then install from PyPI above.

Install from source (for development)

git clone https://github.com/social-engineer-ai/PyAnalytica.git
cd PyAnalytica
pip install -e ".[dev,all]"

Bundled Datasets

Name	Rows	Columns	Description
`tips`	244	7	Restaurant tipping data (total_bill, tip, sex, smoker, day, time, size)
`diamonds`	53,940	10	Prices and attributes of round-cut diamonds
`candidates`	5,000	12	JobMatch simulation — job candidates with skills and experience
`jobs`	500	10	JobMatch simulation — job postings
`companies`	200	8	JobMatch simulation — companies
`events`	15,000	6	JobMatch simulation — recruiting events (applications, interviews, offers)

from pyanalytica.datasets import list_datasets, load_dataset

list_datasets()          # ['candidates', 'companies', 'diamonds', 'events', 'jobs', 'tips']
df = load_dataset("diamonds")

To regenerate bundled datasets:

PYTHONPATH=src python -m pyanalytica.datasets.generate

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    Shiny for Python UI                   │
│  ┌──────────────────────────────────────────────────┐   │
│  │  Modules: mod_load, mod_profile, mod_view, ...   │   │
│  └──────────────┬───────────────────────────────────┘   │
│                 │                                        │
│  ┌──────────────▼───────────────────────────────────┐   │
│  │  Components: dataset_selector, code_panel,       │   │
│  │  decimals_control, chat_panel, download_result   │   │
│  └──────────────┬───────────────────────────────────┘   │
├─────────────────┼───────────────────────────────────────┤
│                 │      Analytics Packages                │
│  ┌──────────────▼───────────────────────────────────┐   │
│  │  data/   explore/   visualize/   analyze/        │   │
│  │  model/  homework/  report/      ai/             │   │
│  └──────────────┬───────────────────────────────────┘   │
│                 │                                        │
│  ┌──────────────▼───────────────────────────────────┐   │
│  │  Core: codegen, state, config, theme, profile,   │   │
│  │  model_store, procedure, session, column_utils   │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

The architecture follows a package-first design:

Core provides shared utilities (CodeSnippet generation, state management, configuration)
Analytics packages (data/, explore/, visualize/, analyze/, model/) contain pure functions that work independently of any UI
UI modules in ui/modules/ call analytics functions and handle Shiny reactivity
WorkbenchState is a simple data store; the Shiny reactive graph manages the current selection

Configuration

User Profile

PyAnalytica reads user preferences from ~/.pyanalytica/profile.yaml (auto-created on first use):

# ~/.pyanalytica/profile.yaml
api_key: ""          # Anthropic API key for AI features
decimals: 3          # Default decimal places for numeric output
theme: default       # UI theme

# Instructor fields (optional)
instructor_name: ""
institution: ""
course: ""

Precedence: Environment variable > profile.yaml > built-in default

Setting	Env Variable	Default
API key	`ANTHROPIC_API_KEY`	(none)
Decimals	`PYANALYTICA_DECIMALS`	3
Theme	`PYANALYTICA_THEME`	default

Course Configuration

Instructors can place a pyanalytica.yaml in the working directory to control which menu items are visible (with optional date-gating):

menus:
  - name: Data
    visible: true
  - name: Model
    visible: true
    after: "2025-02-15"   # Only show after this date
  - name: Homework
    visible: true

For Instructors

Homework Framework

Create YAML-based assignments with hash-checked answers:

# homework1.yaml
title: "Homework 1: Exploratory Data Analysis"
dataset: tips
due_date: "2025-03-01"
questions:
  - id: q1
    type: numeric
    prompt: "What is the mean total bill?"
    answer_hash: "sha256:..."    # Hash of the correct answer
    tolerance: 0.01
  - id: q2
    type: multiple_choice
    prompt: "Which day has the highest average tip?"
    choices: ["Thur", "Fri", "Sat", "Sun"]
    answer_hash: "sha256:..."
  - id: q3
    type: dataframe
    prompt: "Create a summary table of mean tip by day"
    answer_hash: "sha256:..."

Question types: numeric, multiple_choice, text, dataframe

Generate answer hashes:

from pyanalytica.homework.schema import hash_answer
hash_answer(19.7859)    # 'sha256:...'
hash_answer("Sun")      # 'sha256:...'

Students complete assignments in the Homework tab and export submissions as JSON files for grading.

AI Features

PyAnalytica includes four AI-powered modules that work in rule-based mode by default and can be enhanced with an Anthropic API key:

Module	Rule-based	LLM-enhanced
Interpret	Template-based statistical interpretation of results	Claude provides nuanced, context-aware explanations
Suggest	Heuristic next-step recommendations based on data types	Claude suggests analyses tailored to the specific dataset
Challenge	Pre-written critical thinking questions	Claude generates Socratic questions about the analysis
Query	Keyword-based column/operation matching	Claude translates natural language to pandas code

Set your API key via environment variable or user profile:

export ANTHROPIC_API_KEY="sk-ant-..."

Procedure Builder & Reports

Recording workflows

The Procedure Builder records every analytics operation as a reproducible step:

Click Start Recording in the Report > Procedure tab
Perform your analysis (load data, transform, visualize, model, etc.)
Each step is captured with its code snippet and can be annotated with comments
Stop Recording when done

Export formats

Format	Description
JSON	Full roundtrip format — reload procedures later
Python script	Standalone `.py` file with all imports and code
Jupyter notebook	`.ipynb` with markdown headers and code cells
HTML report	Rendered HTML with results and visualizations

from pyanalytica.core.procedure import Procedure

proc = Procedure.from_json("my_analysis.json")
proc.to_python("my_analysis.py")
proc.to_notebook("my_analysis.ipynb")

Development

Setup

git clone https://github.com/social-engineer-ai/PyAnalytica.git
cd PyAnalytica
pip install -e ".[dev,all]"

# Generate bundled datasets
PYTHONPATH=src python -m pyanalytica.datasets.generate

Run tests

PYTHONPATH=src python -m pytest tests/ -v

Build

pip install build
python -m build

Project structure

PyAnalytica/
├── src/pyanalytica/
│   ├── __init__.py              # Package version
│   ├── __main__.py              # python -m pyanalytica entry
│   ├── core/                    # Shared utilities
│   │   ├── codegen.py           # CodeSnippet + on_record hook
│   │   ├── column_utils.py      # ColumnType classification
│   │   ├── config.py            # CourseConfig + menu visibility
│   │   ├── model_store.py       # ModelArtifact + ModelStore
│   │   ├── procedure.py         # ProcedureStep / Procedure / Recorder
│   │   ├── profile.py           # UserProfile + get_api_key()
│   │   ├── session.py           # Session save / load / list
│   │   ├── state.py             # WorkbenchState
│   │   └── theme.py             # Theme management
│   ├── data/                    # Load, profile, transform, combine, export
│   ├── explore/                 # Summarize, pivot, crosstab
│   ├── visualize/               # Distribute, relate, compare, correlate, timeline
│   ├── analyze/                 # Means, proportions, correlation
│   ├── model/                   # Regression, classify, cluster, reduce, evaluate, predict
│   ├── homework/                # Schema, loader, grader, submission
│   ├── report/                  # Notebook + export
│   ├── ai/                      # Interpret, suggest, challenge, query
│   ├── datasets/                # Bundled CSV data + generator
│   └── ui/                      # Shiny application
│       ├── app.py               # Main app entry point
│       ├── www/style.css         # Glassmorphism CSS theme
│       ├── components/          # Reusable UI components
│       └── modules/             # Feature modules (data/, explore/, visualize/, ...)
├── tests/                       # 274 tests across 42 test files
├── pyproject.toml               # Build config (hatchling)
├── CHANGELOG.md                 # Version history
└── LICENSE                      # MIT License

Contributing

Fork the repository
Create a branch for your feature (git checkout -b feature/my-feature)
Write tests for new functionality
Run the test suite to ensure all tests pass
Submit a pull request with a clear description

Code style

All analytics functions return (result, CodeSnippet) tuples
CodeSnippets emit real pandas/sklearn code, never wrapper calls
Use ColumnType for column classification instead of ad-hoc dtype checks
Keep UI modules thin — business logic belongs in analytics packages

License

See LICENSE for details.

Acknowledgements

PyAnalytica is inspired by Radiant by Vincent Nijs (UC San Diego) — a comprehensive R/Shiny analytics platform for business education.

Built with Shiny for Python, pandas, scikit-learn, matplotlib, seaborn, and SciPy.

AI features powered by Anthropic Claude.

Developed for teaching at the University of Illinois at Urbana-Champaign.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.6

Feb 19, 2026

0.4.5

Feb 18, 2026

0.4.4

Feb 18, 2026

0.4.3

Feb 18, 2026

0.4.2

Feb 18, 2026

0.4.1

Feb 18, 2026

0.4.0

Feb 18, 2026

0.3.0

Feb 12, 2026

0.2.0

Feb 11, 2026

0.1.0

Feb 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyanalytica-0.4.6.tar.gz (937.4 kB view details)

Uploaded Feb 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyanalytica-0.4.6-py3-none-any.whl (992.2 kB view details)

Uploaded Feb 19, 2026 Python 3

File details

Details for the file pyanalytica-0.4.6.tar.gz.

File metadata

Download URL: pyanalytica-0.4.6.tar.gz
Upload date: Feb 19, 2026
Size: 937.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pyanalytica-0.4.6.tar.gz
Algorithm	Hash digest
SHA256	`fdad420d1bdfa65a69d43481095665c24a388b6090ffa498b936cc11fdced136`
MD5	`175b6155f2d178a1a87fbc504ace5a53`
BLAKE2b-256	`630cf46871f3f7714bd2f0dfba0b5194212c33e3f56f2eea51810af08fce1cc9`

See more details on using hashes here.

File details

Details for the file pyanalytica-0.4.6-py3-none-any.whl.

File metadata

Download URL: pyanalytica-0.4.6-py3-none-any.whl
Upload date: Feb 19, 2026
Size: 992.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pyanalytica-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83071457e40e7c739b2d792e866453ed1b8aff8cb1ee0b8c44e6af84c1c8eb6c`
MD5	`41ebceb9cbe05a370290cd34da222619`
BLAKE2b-256	`cb72df190081ddb73597335454366707a0adadee684b5f0e616c04be9a0dc415`

See more details on using hashes here.

pyanalytica 0.4.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyAnalytica

Feature Highlights

Quick Start

Launch the interactive workbench

Use as a Python library

The CodeSnippet Pattern

Installation

Install from source (for development)

Bundled Datasets

Architecture Overview

Configuration

User Profile

Course Configuration

Homework Framework

AI Features

Procedure Builder & Reports

Recording workflows

Export formats

Development

Setup

Run tests

Build

Project structure

Contributing

Code style

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes