Skip to main content

Python SDK for FeatureCanvas — the no-code feature engineering studio

Project description

⌁ FeatureCanvas

No-code feature engineering studio with live impact scoring, leakage guardrails, and AI-powered suggestions.

Live Demo PyPI License Python

Live Demo · API Docs · SDK on PyPI


What is FeatureCanvas?

FeatureCanvas is a visual, drag-and-drop feature engineering studio — think KNIME or RapidMiner, but with four things none of those tools offer out of the box:

Feature KNIME / RapidMiner FeatureCanvas
Drag-and-drop transform canvas
Live predictive impact score per node
Leakage guardrails (rule-based)
AI Copilot with stat-grounded suggestions
No lock-in — plain pandas/sklearn export
Real DAG branching (siblings isolated) Partial
Path-scoped leakage (per branch)

Features

🎯 Live Predictive Impact Score

Every applied transform node shows mutual information and correlation against your target column in real time — directly on the node face, not buried in a separate view.

🛡️ Leakage Guardrails

Rule-based detection fires on the patterns that silently destroy model performance in production:

  • Fit-before-split — scaling/encoding fitted on the full dataset before a train/test split
  • Groupby self-inclusion — target encoding where each row can see its own label
  • Target-touched transforms — derived features that are direct functions of the target
  • Target binned as feature — the target column itself discretised into the feature set

All findings are path-scoped — a violation on Branch A never appears in Branch B's leakage panel.

🤖 AI Feature Copilot (Claude-powered)

Suggest transforms button → Claude profiles your dataset's actual column stats and returns ranked, explainable suggestions constrained to real executable transform keys. No hallucinated column names, no free-text the user has to translate.

🔓 No Lock-in Code Export

Every transform has a matching codegen() function. Select any node → click Code → get a standalone Python script with plain pandas/numpy/sklearn that runs anywhere with zero FeatureCanvas dependency.

🌿 Real DAG Branching

Two branches off the same parent are genuinely independent — their dataframes are resolved by walking the real ancestor chain, not by replaying a global ordered list. Sibling nodes never contaminate each other's column dropdowns, impact scores, or leakage findings.

💾 Session Persistence

Sessions survive backend restarts via Upstash Redis (DataFrames stored as Parquet bytes, node graphs as JSON). Canvas layout and sparklines restored from localStorage on page reload.


Tech Stack

Layer Technology
Frontend React 18 + Vite + react-flow + Tailwind
Backend FastAPI + Python 3.11
ML Engine pandas, scikit-learn, scipy
AI Anthropic Claude API
Session Store Upstash Redis (Parquet + JSON)
Deploy Vercel (frontend) + Render (backend)
SDK Pure Python, pip install featurecanvas

Transforms (21)

Numeric

log · sqrt · standard_scale · minmax_scale · robust_scale · power_transform · clip_outliers · abs_value · binning · column_ratio · column_diff

Categorical

onehot · label_encode · frequency_encode · rare_group

Datetime

datetime_decompose

Relational

groupby_agg

Cleaning

fillna · drop_column · rename_column


Python SDK

pip install featurecanvas
from featurecanvas import FeatureCanvas

fc = FeatureCanvas("https://featurecanvas.onrender.com")

# Upload a CSV and set the target
session = fc.upload("train.csv")
session.set_target("churned")

# Build a pipeline
log_node = session.apply("log", column="monthly_income")
scaled = log_node.apply("standard_scale", column="monthly_income_log")

# Branch off an earlier node
sqrt_node = session.apply("sqrt", column="age")  # sibling of log_node

# Inspect
print(scaled.columns())         # ['monthly_income_log_scaled', ...]
print(scaled.leakage())         # leakage findings scoped to this branch
print(scaled.scores())          # MI scores vs target

# Check for leakage before shipping
risky = session.apply("groupby_agg",
    group_column="city", agg_column="churned", agg_func="mean")
if risky.has_leakage("high"):
    print("HIGH leakage detected:", risky.leakage())

# Export clean Python — no FeatureCanvas dependency
print(scaled.code())

Local Development

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Upstash Redis account (free tier works)
  • Anthropic API key (for Copilot)

Backend

cd backend
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Mac/Linux
pip install -r requirements.txt

Create backend/.env:

ANTHROPIC_API_KEY=sk-ant-...
UPSTASH_REDIS_URL=rediss://default:...@....upstash.io:6379
uvicorn app.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173

Tests

# Backend (44 tests)
cd backend && python -m pytest tests/ -v

# Frontend integration (requires backend running)
cd frontend && npx vitest run --testTimeout=20000

Deployment

Service Config file Notes
Render (backend) backend/render.yaml Set ANTHROPIC_API_KEY, UPSTASH_REDIS_URL, FRONTEND_URL
Vercel (frontend) frontend/vercel.json Set VITE_API_BASE_URL=https://your-render-url/api

Project Structure

featurecanvas/
├── backend/
│   ├── app/
│   │   ├── engine/          # Core ML engine
│   │   │   ├── transforms.py    # 21 transforms with codegen
│   │   │   ├── leakage.py       # Rule-based leakage detection
│   │   │   ├── impact_scoring.py # MI / correlation scoring
│   │   │   ├── graph.py         # DAG resolution engine
│   │   │   ├── session_store.py # Redis-backed persistence
│   │   │   ├── copilot.py       # Claude AI suggestions
│   │   │   ├── codegen.py       # Python script export
│   │   │   └── profiling.py     # Column statistics
│   │   ├── main.py          # FastAPI endpoints
│   │   └── schemas.py       # Pydantic models
│   ├── tests/               # 44 backend tests
│   ├── requirements.txt
│   └── render.yaml
├── frontend/
│   ├── src/
│   │   ├── App.jsx          # Main canvas + state
│   │   ├── nodes/           # SourceNode, TransformNode, TargetNode
│   │   ├── components/      # Sidebar, panels, LeakagePanel, etc.
│   │   └── api/client.js    # Axios API client
│   └── vercel.json
├── sdk/                     # featurecanvas PyPI package
│   ├── featurecanvas/
│   │   ├── __init__.py
│   │   └── client.py
│   └── pyproject.toml
└── featurecanvas_test_data.csv

Built by

G. Preetham Saxon — B.Tech CSE @ VIIT Visakhapatnam · IEEE Student Branch Vice Chairperson · AI Product Engineer

GitHub LinkedIn


License

MIT © G. Preetham Saxon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featurecanvas-0.1.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featurecanvas-0.1.1-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file featurecanvas-0.1.1.tar.gz.

File metadata

  • Download URL: featurecanvas-0.1.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for featurecanvas-0.1.1.tar.gz
Algorithm Hash digest
SHA256 85b73afcdec04cbcd8bb454f66a4d87864461b6da6d3db433d5c4c93de4beb62
MD5 960550e60e1e84e8478b13e4bdf9f401
BLAKE2b-256 c8f57989e6ab0793131057915fd6f35704b18c5682d0f4e62d87d902116d6197

See more details on using hashes here.

File details

Details for the file featurecanvas-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: featurecanvas-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for featurecanvas-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0345242c1af4aa0bd8aa10cd32ae3ca32e99f2bcc009993bab5ca5f85c1fd8b6
MD5 46dbe46b5c08153cb5eb9eeea9f7dfdc
BLAKE2b-256 a674d3d2aa5e72e5d4d7cfc23881c5f8e02ea6abb646ab9a91cd2a8c2a8de311

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page