Intelligent imputation analysis with automatic data validation and metadata inference
Project description
FunPuter - Intelligent Imputation Analysis
Intelligent imputation analysis with automatic data validation and metadata inference
FunPuter analyzes your data and recommends the best imputation methods based on data patterns, missing mechanisms, and metadata constraints. Get intelligent suggestions with confidence scores to handle missing data professionally.
🚀 Quick Start
Installation
pip install funputer
30-Second Example
Auto-Inference Mode (Zero Configuration)
import funputer
# Point to your CSV - FunPuter figures out everything automatically
suggestions = funputer.analyze_imputation_requirements("your_data.csv")
# Get intelligent suggestions with confidence scores
for suggestion in suggestions:
if suggestion.missing_count > 0:
print(f"📊 {suggestion.column_name}: {suggestion.proposed_method}")
print(f" Confidence: {suggestion.confidence_score:.3f}")
print(f" Reason: {suggestion.rationale}")
print(f" Missing: {suggestion.missing_count} ({suggestion.missing_percentage:.1f}%)")
Production Mode (Full Control)
import funputer
from funputer.models import ColumnMetadata
# Define your data structure with constraints
metadata = [
ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
ColumnMetadata('age', 'integer', min_value=18, max_value=100),
ColumnMetadata('income', 'float', min_value=0),
ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),
]
# Get production-grade suggestions
suggestions = funputer.analyze_dataframe(your_dataframe, metadata)
🎯 Key Features
- 🤖 Automatic Metadata Inference - Intelligent data type and constraint detection
- 📊 Missing Data Analysis - MCAR, MAR, MNAR mechanism detection
- ⚡ Data Validation - Real-time constraint checking and validation
- 🎯 Smart Recommendations - Context-aware imputation method suggestions
- 📈 Confidence Scoring - Transparent reliability estimates for each recommendation
- 🛡️ Pre-flight Checks - Comprehensive data validation before analysis
- 💻 CLI & Python API - Flexible usage via command line or programmatic access
📊 Data Validation System
Comprehensive validation runs automatically to prevent crashes and guide your workflow:
- File validation: Format detection, encoding, accessibility
- Structure validation: Column analysis, data type inference
- Memory estimation: Resource usage prediction
- Advisory recommendations: Guided workflow suggestions
Independent Usage:
# Basic validation check
funputer preflight -d your_data.csv
# With custom options
funputer preflight -d data.csv --sample-rows 5000 --encoding utf-8
# JSON report output
funputer preflight -d data.csv --json-out report.json
Exit Codes:
0: Ready for analysis2: OK with warnings (can proceed)10: Hard error (cannot proceed)
💻 Command Line Interface
# Generate metadata template from your data
funputer init -d data.csv -o metadata.csv
# Analyze with auto-inference
funputer analyze -d data.csv
# Analyze with custom metadata
funputer analyze -d data.csv -m metadata.csv --verbose
# Data quality check first
funputer preflight -d data.csv
📚 Usage Examples
Basic Analysis
import funputer
# Simple analysis with auto-inference
suggestions = funputer.analyze_imputation_requirements("sales_data.csv")
# Display recommendations
for suggestion in suggestions:
print(f"Column: {suggestion.column_name}")
print(f"Method: {suggestion.proposed_method}")
print(f"Confidence: {suggestion.confidence_score:.3f}")
print(f"Missing: {suggestion.missing_count} values")
print()
Advanced Configuration
from funputer.models import ColumnMetadata, AnalysisConfig
from funputer.simple_analyzer import SimpleImputationAnalyzer
# Custom metadata with business rules
metadata = [
ColumnMetadata('product_id', 'string', unique_flag=True, max_length=10),
ColumnMetadata('price', 'float', min_value=0, max_value=10000),
ColumnMetadata('category', 'categorical', allowed_values='Electronics,Books,Clothing'),
ColumnMetadata('rating', 'float', min_value=1.0, max_value=5.0),
]
# Custom analysis configuration
config = AnalysisConfig(
missing_percentage_threshold=0.3, # 30% threshold
skip_columns=['internal_id'],
outlier_threshold=0.1
)
# Run analysis
analyzer = SimpleImputationAnalyzer(config)
suggestions = analyzer.analyze_dataframe(df, metadata)
Industry-Specific Examples
E-commerce Analytics
metadata = [
ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
ColumnMetadata('age', 'integer', min_value=13, max_value=120),
ColumnMetadata('purchase_amount', 'float', min_value=0),
ColumnMetadata('customer_segment', 'categorical', allowed_values='Premium,Standard,Basic'),
]
suggestions = funputer.analyze_dataframe(customer_df, metadata)
Healthcare Data
metadata = [
ColumnMetadata('patient_id', 'integer', unique_flag=True, nullable=False),
ColumnMetadata('age', 'integer', min_value=0, max_value=150),
ColumnMetadata('blood_pressure', 'integer', min_value=50, max_value=300),
ColumnMetadata('diagnosis', 'categorical', nullable=False),
]
config = AnalysisConfig(missing_threshold=0.05) # Low tolerance for healthcare
suggestions = funputer.analyze_dataframe(patient_df, metadata, config)
Financial Risk Assessment
metadata = [
ColumnMetadata('application_id', 'integer', unique_flag=True, nullable=False),
ColumnMetadata('credit_score', 'integer', min_value=300, max_value=850),
ColumnMetadata('debt_to_income', 'float', min_value=0.0, max_value=10.0),
ColumnMetadata('loan_purpose', 'categorical', allowed_values='home,auto,personal,business'),
]
# Skip sensitive columns
config = AnalysisConfig(skip_columns=['ssn', 'account_number'])
suggestions = funputer.analyze_dataframe(loan_df, metadata, config)
⚙️ Requirements
- Python: 3.9 or higher
- Dependencies: pandas, numpy, scipy, pydantic, click, pyyaml
🔧 Installation from Source
git clone https://github.com/RajeshRamachander/funputer.git
cd funputer
pip install -e .
📚 Documentation
- API Reference: Complete docstrings and type hints throughout the codebase
- Examples: See usage examples above and in the codebase
- Test Coverage: 77% coverage with comprehensive test suite
🤝 Contributing
We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.
📄 License
MIT License - see LICENSE file for details.
Focus: Get intelligent imputation recommendations, not complex infrastructure.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file funputer-1.4.0.tar.gz.
File metadata
- Download URL: funputer-1.4.0.tar.gz
- Upload date:
- Size: 72.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
328c59fa5d5687b3098a38f2fb3fd3e11a6ba60969767c15a43b35a81b1e1726
|
|
| MD5 |
401b1c9da992071a39223d6631da6815
|
|
| BLAKE2b-256 |
9a4265f0cca6a2b15cd2861495a3fae725e5eb9d3a76bf7e60bad04a8acd9bad
|
File details
Details for the file funputer-1.4.0-py3-none-any.whl.
File metadata
- Download URL: funputer-1.4.0-py3-none-any.whl
- Upload date:
- Size: 38.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9317c98849cc2cfd988b4efd842c88cd875115d1dbb5aafcb6262246615d2b17
|
|
| MD5 |
3a32bff6bd7afaef32e844ed74e84fb6
|
|
| BLAKE2b-256 |
e203b9ce90ce324b8a7aa4900d97fb706051cbb815a1c08fcc1242c632ab548b
|