Simple, intelligent imputation analysis for data science
Project description
FunPuter - Intelligent Imputation Analysis
Simple, fast, intelligent recommendations for handling missing data.
FunImpute analyzes your data and suggests the best imputation methods based on:
- Missing data mechanisms (MCAR, MAR, MNAR detection)
- Data types and statistical properties
- Business rules and column dependencies
- Adaptive thresholds based on your dataset characteristics
Quick Start
Installation
pip install funputer
Basic Usage
Python API (Recommended)
import funimpute
# Analyze your dataset
suggestions = funputer.analyze_imputation_requirements(
metadata_path="metadata.csv",
data_path="data.csv"
)
# Use the suggestions
for suggestion in suggestions:
print(f"{suggestion.column_name}: {suggestion.proposed_method}")
print(f" Rationale: {suggestion.rationale}")
print(f" Confidence: {suggestion.confidence_score:.3f}")
Command Line
# Analyze and save results
funputer -m metadata.csv -d data.csv -o suggestions.csv
# View results
funputer -m metadata.csv -d data.csv --verbose
Metadata Format
Create a CSV with your column information:
column_name,data_type,min_value,max_value,unique_flag,dependent_column,business_rule,description
user_id,integer,1,999999,TRUE,,,User identifier
age,integer,0,120,FALSE,,Must be positive,User age
income,float,0,,FALSE,age,Higher with age,Annual income
category,categorical,,,FALSE,,,User category A/B/C
Required columns:
column_name: Name of your data columndata_type: One ofinteger,float,string,categorical,datetime,boolean
Optional columns:
min_value,max_value: Valid ranges for numeric dataunique_flag: Set toTRUEfor ID columnsdependent_column: Related column for dependency analysisbusiness_rule: Business constraints or relationshipsdescription: Human-readable description
Client Application Integration
Direct DataFrame Analysis
import pandas as pd
import funimpute
from funputer import ColumnMetadata
# Your data
data = pd.DataFrame({
'age': [25, None, 35, 42, None],
'income': [50000, 60000, None, 80000, 45000],
'category': ['A', 'B', None, 'A', 'C']
})
# Define metadata programmatically
metadata = [
ColumnMetadata('age', 'integer', min_value=0, max_value=120),
ColumnMetadata('income', 'float', dependent_column='age', business_rule='Higher with age'),
ColumnMetadata('category', 'categorical')
]
# Get suggestions
suggestions = funputer.analyze_dataframe(data, metadata)
# Apply suggestions (Phase 2 - your implementation)
for s in suggestions:
if s.proposed_method == "Median":
data[s.column_name].fillna(data[s.column_name].median(), inplace=True)
elif s.proposed_method == "Mode":
data[s.column_name].fillna(data[s.column_name].mode().iloc[0], inplace=True)
# ... implement other methods as needed
Configuration
from funputer import AnalysisConfig
# Custom analysis settings
config = AnalysisConfig(
iqr_multiplier=2.0, # Outlier detection sensitivity
correlation_threshold=0.4, # Relationship detection threshold
skewness_threshold=1.5 # Mean vs median decision point
)
suggestions = funputer.analyze_imputation_requirements(
"metadata.csv", "data.csv", config=config
)
What You Get
Each suggestion includes:
suggestion.column_name # 'age'
suggestion.proposed_method # 'Median'
suggestion.rationale # 'Numeric data with MCAR mechanism...'
suggestion.confidence_score # 0.847
suggestion.missing_count # 15
suggestion.missing_percentage # 0.075 (7.5%)
Available Methods:
Mean,Median,Mode- Statistical imputationRegression,kNN- Predictive imputationBusiness Rule- Domain-specific logicForward Fill,Backward Fill- Temporal imputationManual Backfill- Requires human interventionNo action needed- No missing values
Key Features
✅ Intelligent Analysis - Detects missing data mechanisms automatically
✅ Business Rule Integration - Uses your domain knowledge
✅ Adaptive Thresholds - Adjusts based on your data characteristics
✅ High Performance - Analyzes 100+ columns in seconds
✅ Simple API - Easy integration with existing workflows
✅ Type Safe - Full type hints and validation
Real-World Example
# Your existing data pipeline
import pandas as pd
import funimpute
def process_customer_data(df):
# 1. Define your metadata once
metadata = [
ColumnMetadata('customer_id', 'integer', unique_flag=True),
ColumnMetadata('age', 'integer', min_value=0, max_value=120),
ColumnMetadata('income', 'float', dependent_column='age'),
ColumnMetadata('segment', 'categorical'),
]
# 2. Get intelligent suggestions
suggestions = funputer.analyze_dataframe(df, metadata)
# 3. Apply high-confidence suggestions automatically
for s in suggestions:
if s.confidence_score > 0.8:
if s.proposed_method == "Median":
df[s.column_name].fillna(df[s.column_name].median(), inplace=True)
elif s.proposed_method == "Mode":
df[s.column_name].fillna(df[s.column_name].mode().iloc[0], inplace=True)
else:
print(f"Manual review needed for {s.column_name}: {s.rationale}")
return df
Distribution
- PyPI Package:
pip install funputer - Source Code: Available on GitHub
- Requirements: Python 3.9+, pandas, numpy, scipy
License
MIT License - Use freely in commercial and open-source projects.
Focus: Get intelligent imputation recommendations, not complex infrastructure.
Philosophy: Simple tools that scale with your needs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file funputer-1.0.3.tar.gz.
File metadata
- Download URL: funputer-1.0.3.tar.gz
- Upload date:
- Size: 48.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d64af8433e88df610d989880882593743dcad63cea62f6a35a7bc697514661f
|
|
| MD5 |
2995ba5af477707eefe6fb8f94512063
|
|
| BLAKE2b-256 |
23e8a52814e1b6c025f129e3485d9c01c3697c4d8955069df547cfb3fdc34fec
|
File details
Details for the file funputer-1.0.3-py3-none-any.whl.
File metadata
- Download URL: funputer-1.0.3-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4925a094677ac955a2eb0375a7a431b67db37ce5e109ce6862cbac504bd6c08a
|
|
| MD5 |
7f19cacd7d5e68418041e301fdee1827
|
|
| BLAKE2b-256 |
7d8148365f389cd62e8e99589fb17efaaf4d1a8562ddf7ff790c8ede36a0e10d
|