Python SDK for FeatureCanvas — the no-code feature engineering studio
Project description
⌁ FeatureCanvas
No-code feature engineering studio with live impact scoring, leakage guardrails, and AI-powered suggestions.
What is FeatureCanvas?
FeatureCanvas is a visual, drag-and-drop feature engineering studio — think KNIME or RapidMiner, but with four things none of those tools offer out of the box:
| Feature | KNIME / RapidMiner | FeatureCanvas |
|---|---|---|
| Drag-and-drop transform canvas | ✅ | ✅ |
| Live predictive impact score per node | ❌ | ✅ |
| Leakage guardrails (rule-based) | ❌ | ✅ |
| AI Copilot with stat-grounded suggestions | ❌ | ✅ |
| No lock-in — plain pandas/sklearn export | ❌ | ✅ |
| Real DAG branching (siblings isolated) | Partial | ✅ |
| Path-scoped leakage (per branch) | ❌ | ✅ |
Features
🎯 Live Predictive Impact Score
Every applied transform node shows mutual information and correlation against your target column in real time — directly on the node face, not buried in a separate view.
🛡️ Leakage Guardrails
Rule-based detection fires on the patterns that silently destroy model performance in production:
- Fit-before-split — scaling/encoding fitted on the full dataset before a train/test split
- Groupby self-inclusion — target encoding where each row can see its own label
- Target-touched transforms — derived features that are direct functions of the target
- Target binned as feature — the target column itself discretised into the feature set
All findings are path-scoped — a violation on Branch A never appears in Branch B's leakage panel.
🤖 AI Feature Copilot (Claude-powered)
Suggest transforms button → Claude profiles your dataset's actual column stats and returns ranked, explainable suggestions constrained to real executable transform keys. No hallucinated column names, no free-text the user has to translate.
🔓 No Lock-in Code Export
Every transform has a matching codegen() function. Select any node → click Code → get a standalone Python script with plain pandas/numpy/sklearn that runs anywhere with zero FeatureCanvas dependency.
🌿 Real DAG Branching
Two branches off the same parent are genuinely independent — their dataframes are resolved by walking the real ancestor chain, not by replaying a global ordered list. Sibling nodes never contaminate each other's column dropdowns, impact scores, or leakage findings.
💾 Session Persistence
Sessions survive backend restarts via Upstash Redis (DataFrames stored as Parquet bytes, node graphs as JSON). Canvas layout and sparklines restored from localStorage on page reload.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 18 + Vite + react-flow + Tailwind |
| Backend | FastAPI + Python 3.11 |
| ML Engine | pandas, scikit-learn, scipy |
| AI | Anthropic Claude API |
| Session Store | Upstash Redis (Parquet + JSON) |
| Deploy | Vercel (frontend) + Render (backend) |
| SDK | Pure Python, pip install featurecanvas |
Transforms (21)
Numeric
log · sqrt · standard_scale · minmax_scale · robust_scale · power_transform · clip_outliers · abs_value · binning · column_ratio · column_diff
Categorical
onehot · label_encode · frequency_encode · rare_group
Datetime
datetime_decompose
Relational
groupby_agg
Cleaning
fillna · drop_column · rename_column
Python SDK
pip install featurecanvas
from featurecanvas import FeatureCanvas
fc = FeatureCanvas("https://featurecanvas.onrender.com")
# Upload a CSV and set the target
session = fc.upload("train.csv")
session.set_target("churned")
# Build a pipeline
log_node = session.apply("log", column="monthly_income")
scaled = log_node.apply("standard_scale", column="monthly_income_log")
# Branch off an earlier node
sqrt_node = session.apply("sqrt", column="age") # sibling of log_node
# Inspect
print(scaled.columns()) # ['monthly_income_log_scaled', ...]
print(scaled.leakage()) # leakage findings scoped to this branch
print(scaled.scores()) # MI scores vs target
# Check for leakage before shipping
risky = session.apply("groupby_agg",
group_column="city", agg_column="churned", agg_func="mean")
if risky.has_leakage("high"):
print("HIGH leakage detected:", risky.leakage())
# Export clean Python — no FeatureCanvas dependency
print(scaled.code())
Local Development
Prerequisites
- Python 3.11+
- Node.js 18+
- Upstash Redis account (free tier works)
- Anthropic API key (for Copilot)
Backend
cd backend
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Mac/Linux
pip install -r requirements.txt
Create backend/.env:
ANTHROPIC_API_KEY=sk-ant-...
UPSTASH_REDIS_URL=rediss://default:...@....upstash.io:6379
uvicorn app.main:app --reload --port 8000
Frontend
cd frontend
npm install
npm run dev
Open http://localhost:5173
Tests
# Backend (44 tests)
cd backend && python -m pytest tests/ -v
# Frontend integration (requires backend running)
cd frontend && npx vitest run --testTimeout=20000
Deployment
| Service | Config file | Notes |
|---|---|---|
| Render (backend) | backend/render.yaml |
Set ANTHROPIC_API_KEY, UPSTASH_REDIS_URL, FRONTEND_URL |
| Vercel (frontend) | frontend/vercel.json |
Set VITE_API_BASE_URL=https://your-render-url/api |
Project Structure
featurecanvas/
├── backend/
│ ├── app/
│ │ ├── engine/ # Core ML engine
│ │ │ ├── transforms.py # 21 transforms with codegen
│ │ │ ├── leakage.py # Rule-based leakage detection
│ │ │ ├── impact_scoring.py # MI / correlation scoring
│ │ │ ├── graph.py # DAG resolution engine
│ │ │ ├── session_store.py # Redis-backed persistence
│ │ │ ├── copilot.py # Claude AI suggestions
│ │ │ ├── codegen.py # Python script export
│ │ │ └── profiling.py # Column statistics
│ │ ├── main.py # FastAPI endpoints
│ │ └── schemas.py # Pydantic models
│ ├── tests/ # 44 backend tests
│ ├── requirements.txt
│ └── render.yaml
├── frontend/
│ ├── src/
│ │ ├── App.jsx # Main canvas + state
│ │ ├── nodes/ # SourceNode, TransformNode, TargetNode
│ │ ├── components/ # Sidebar, panels, LeakagePanel, etc.
│ │ └── api/client.js # Axios API client
│ └── vercel.json
├── sdk/ # featurecanvas PyPI package
│ ├── featurecanvas/
│ │ ├── __init__.py
│ │ └── client.py
│ └── pyproject.toml
└── featurecanvas_test_data.csv
Built by
G. Preetham Saxon — B.Tech CSE @ VIIT Visakhapatnam · IEEE Student Branch Vice Chairperson · AI Product Engineer
License
MIT © G. Preetham Saxon
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file featurecanvas-0.1.1.tar.gz.
File metadata
- Download URL: featurecanvas-0.1.1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85b73afcdec04cbcd8bb454f66a4d87864461b6da6d3db433d5c4c93de4beb62
|
|
| MD5 |
960550e60e1e84e8478b13e4bdf9f401
|
|
| BLAKE2b-256 |
c8f57989e6ab0793131057915fd6f35704b18c5682d0f4e62d87d902116d6197
|
File details
Details for the file featurecanvas-0.1.1-py3-none-any.whl.
File metadata
- Download URL: featurecanvas-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0345242c1af4aa0bd8aa10cd32ae3ca32e99f2bcc009993bab5ca5f85c1fd8b6
|
|
| MD5 |
46dbe46b5c08153cb5eb9eeea9f7dfdc
|
|
| BLAKE2b-256 |
a674d3d2aa5e72e5d4d7cfc23881c5f8e02ea6abb646ab9a91cd2a8c2a8de311
|