Simple, intelligent imputation analysis for data science
Project description
FunPuter - Intelligent Imputation Analysis
Simple, fast, intelligent recommendations for handling missing data with enhanced metadata support.
FunPuter analyzes your data and suggests the best imputation methods based on:
- Missing data mechanisms (MCAR, MAR, MNAR detection)
- Data types and statistical properties
- Business rules and column dependencies
- Enhanced metadata constraints (nullable, allowed_values, max_length)
- Adaptive thresholds based on your dataset characteristics
🚀 Quick Start
Installation
pip install funputer
Basic Usage
🤖 Auto-Inference Mode (New!)
import funimpute
# Let FunPuter intelligently infer metadata from your data
suggestions = funimpute.analyze_imputation_requirements(
data_path="data.csv" # No metadata file needed!
)
# Use the suggestions
for suggestion in suggestions:
print(f"{suggestion.column_name}: {suggestion.proposed_method}")
print(f" Rationale: {suggestion.rationale}")
print(f" Confidence: {suggestion.confidence_score:.3f}")
📋 Explicit Metadata Mode (Production)
import funimpute
# For maximum accuracy, provide explicit metadata
suggestions = funimpute.analyze_imputation_requirements(
metadata_path="metadata.csv",
data_path="data.csv"
)
🎯 Enhanced Metadata Support (v1.1.0)
FunPuter now supports comprehensive metadata fields that actively influence imputation recommendations:
Metadata Schema
| Field | Type | Description | Example |
|---|---|---|---|
column_name |
string | Column identifier | "age" |
data_type |
string | Data type (integer, float, string, categorical, datetime) | "integer" |
nullable |
boolean | Allow null values | false |
min_value |
number | Minimum allowed value | 0 |
max_value |
number | Maximum allowed value | 120 |
max_length |
integer | Maximum string length | 50 |
allowed_values |
string | Comma-separated list of allowed values | "A,B,C" |
unique_flag |
boolean | Require unique values | true |
dependent_column |
string | Column dependencies | "age" |
business_rule |
string | Custom validation rules | "Must be positive" |
description |
string | Human-readable description | "User age in years" |
🛠️ Creating Metadata
Method 1: CLI Template Generation
# Generate a metadata template from your data
funputer init --data data.csv --output metadata.csv
# Edit the generated file to add constraints
# Then analyze with enhanced metadata
funputer analyze --data data.csv --metadata metadata.csv
Method 2: Manual CSV Creation
# metadata.csv
# column_name,data_type,nullable,min_value,max_value,max_length,allowed_values,unique_flag,dependent_column,business_rule,description
user_id,integer,false,,,50,,true,,,"Unique user identifier"
age,integer,false,0,120,,,,,Must be positive,"User age in years"
income,float,true,0,,,,,age,Higher with age,"Annual income in USD"
category,categorical,false,,,10,"A,B,C",,,,"User category classification"
email,string,true,,,255,,true,,,"User email address"
🎯 Metadata in Action
Example 1: Nullable Constraints
# When nullable=False but data has missing values
metadata = ColumnMetadata(
column_name="age",
data_type="integer",
nullable=False,
min_value=0,
max_value=120
)
# FunPuter will:
# - Detect nullable constraint violations
# - Recommend immediate data quality fixes
# - Lower confidence score due to constraint violations
Example 2: Allowed Values
# For categorical data with specific allowed values
metadata = ColumnMetadata(
column_name="status",
data_type="categorical",
allowed_values="active,inactive,pending"
)
# FunPuter will:
# - Validate all values against allowed list
# - Recommend mode imputation using only allowed values
# - Increase confidence when data respects constraints
Example 3: String Length Constraints
# For string data with length limits
metadata = ColumnMetadata(
column_name="username",
data_type="string",
max_length=20,
unique_flag=True
)
# FunPuter will:
# - Check string lengths against max_length
# - Recommend imputation respecting length limits
# - Consider uniqueness requirements in recommendations
📊 Enhanced Analysis Results
# Results now include metadata-aware recommendations
for suggestion in suggestions:
print(f"Column: {suggestion.column_name}")
print(f"Method: {suggestion.proposed_method}")
print(f"Confidence: {suggestion.confidence_score:.3f}")
print(f"Rationale: {suggestion.rationale}")
# New: Metadata constraint information
if suggestion.metadata_violations:
print(f"Violations: {suggestion.metadata_violations}")
# New: Enhanced parameters
if suggestion.parameters:
print(f"Parameters: {suggestion.parameters}")
🚀 Advanced Usage
Programmatic Metadata Creation
from funimpute.models import ColumnMetadata
metadata = [
ColumnMetadata(
column_name="product_code",
data_type="string",
max_length=10,
allowed_values="A1,A2,B1,B2",
nullable=False,
description="Product classification code"
),
ColumnMetadata(
column_name="price",
data_type="float",
min_value=0,
max_value=10000,
business_rule="Must be non-negative"
)
]
# Analyze with custom metadata
import pandas as pd
data = pd.read_csv("products.csv")
from funimpute.simple_analyzer import SimpleImputationAnalyzer
analyzer = SimpleImputationAnalyzer()
results = analyzer.analyze_dataframe(data, metadata)
CLI Usage with Enhanced Metadata
# Generate template with new fields
funputer init --data products.csv --output products_metadata.csv
# Edit metadata.csv to add constraints, then:
funputer analyze --data products.csv --metadata products_metadata.csv --output results.json
# View results
funputer report --input results.json --format table
📋 Requirements
- Python: 3.9 or higher
- Dependencies: pandas, numpy, scipy, scikit-learn
🔧 Installation from Source
git clone https://github.com/RajeshRamachander/funputer.git
cd funputer
pip install -e .
📚 Documentation
- Full API Reference: GitHub Wiki
- Examples: Examples Directory
- Changelog: CHANGELOG.md
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
📄 License
MIT License - see LICENSE file for details.
Focus: Get intelligent imputation recommendations with enhanced metadata support, not complex infrastructure.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file funputer-1.1.0.tar.gz.
File metadata
- Download URL: funputer-1.1.0.tar.gz
- Upload date:
- Size: 66.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19af359319eb3fa081705d5803120900c8d6ec294af5d95ccd2c7fbcd4b5cf27
|
|
| MD5 |
29e4c851fb795c31d749ba001b043aa8
|
|
| BLAKE2b-256 |
3a376363078ab0e59c54aef3b2198f80826fadf4af65c0337409efc63a0aa502
|
File details
Details for the file funputer-1.1.0-py3-none-any.whl.
File metadata
- Download URL: funputer-1.1.0-py3-none-any.whl
- Upload date:
- Size: 51.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
944c0d4e5da97a7327659c3bb5a19cf8b99fc787d13a95ef84fb1c9341d0fffe
|
|
| MD5 |
4a6d277d160cd128e4f33dc15ce93706
|
|
| BLAKE2b-256 |
fe91561aadb2b5b118537ed3ff560a0eb0f5c4d4ae1147e041f942411bf28f29
|