Skip to main content

Intelligent imputation analysis with automatic data validation and metadata inference

Project description

FunPuter - Intelligent Imputation Analysis

Python 3.9+ PyPI License: MIT Test Coverage

Intelligent imputation analysis with automatic data validation and metadata inference

FunPuter analyzes your data and recommends the best imputation methods based on data patterns, missing mechanisms, and metadata constraints. Get intelligent suggestions with confidence scores to handle missing data professionally.

🚀 Quick Start

Installation

pip install funputer

30-Second Example

Auto-Inference Mode (Zero Configuration)

import funputer

# Point to your CSV - FunPuter figures out everything automatically
suggestions = funputer.analyze_imputation_requirements("your_data.csv")

# Get intelligent suggestions with confidence scores
for suggestion in suggestions:
    if suggestion.missing_count > 0:
        print(f"📊 {suggestion.column_name}: {suggestion.proposed_method}")
        print(f"   Confidence: {suggestion.confidence_score:.3f}")
        print(f"   Reason: {suggestion.rationale}")
        print(f"   Missing: {suggestion.missing_count} ({suggestion.missing_percentage:.1f}%)")

Production Mode (Full Control)

import funputer
from funputer.models import ColumnMetadata

# Define your data structure with constraints
metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=18, max_value=100),
    ColumnMetadata('income', 'float', min_value=0),
    ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),
]

# Get production-grade suggestions
suggestions = funputer.analyze_dataframe(your_dataframe, metadata)

🎯 Key Features

  • 🤖 Automatic Metadata Inference - Intelligent data type and constraint detection
  • 📊 Missing Data Analysis - MCAR, MAR, MNAR mechanism detection
  • ⚡ Data Validation - Real-time constraint checking and validation
  • 🎯 Smart Recommendations - Context-aware imputation method suggestions
  • 📈 Confidence Scoring - Transparent reliability estimates for each recommendation
  • 🛡️ Pre-flight Checks - Comprehensive data validation before analysis
  • 💻 CLI & Python API - Flexible usage via command line or programmatic access

📊 Data Validation System

Comprehensive validation runs automatically to prevent crashes and guide your workflow:

  • File validation: Format detection, encoding, accessibility
  • Structure validation: Column analysis, data type inference
  • Memory estimation: Resource usage prediction
  • Advisory recommendations: Guided workflow suggestions

Independent Usage:

# Basic validation check
funputer preflight -d your_data.csv

# With custom options  
funputer preflight -d data.csv --sample-rows 5000 --encoding utf-8

# JSON report output
funputer preflight -d data.csv --json-out report.json

Exit Codes:

  • 0: Ready for analysis
  • 2: OK with warnings (can proceed)
  • 10: Hard error (cannot proceed)

💻 Command Line Interface

# Generate metadata template from your data
funputer init -d data.csv -o metadata.csv

# Analyze with auto-inference  
funputer analyze -d data.csv

# Analyze with custom metadata
funputer analyze -d data.csv -m metadata.csv --verbose

# Data quality check first
funputer preflight -d data.csv

📚 Usage Examples

Basic Analysis

import funputer

# Simple analysis with auto-inference
suggestions = funputer.analyze_imputation_requirements("sales_data.csv")

# Display recommendations
for suggestion in suggestions:
    print(f"Column: {suggestion.column_name}")
    print(f"Method: {suggestion.proposed_method}")  
    print(f"Confidence: {suggestion.confidence_score:.3f}")
    print(f"Missing: {suggestion.missing_count} values")
    print()

Advanced Configuration

from funputer.models import ColumnMetadata, AnalysisConfig
from funputer.simple_analyzer import SimpleImputationAnalyzer

# Custom metadata with business rules
metadata = [
    ColumnMetadata('product_id', 'string', unique_flag=True, max_length=10),
    ColumnMetadata('price', 'float', min_value=0, max_value=10000),
    ColumnMetadata('category', 'categorical', allowed_values='Electronics,Books,Clothing'),
    ColumnMetadata('rating', 'float', min_value=1.0, max_value=5.0),
]

# Custom analysis configuration
config = AnalysisConfig(
    missing_percentage_threshold=0.3,  # 30% threshold
    skip_columns=['internal_id'],
    outlier_threshold=0.1
)

# Run analysis
analyzer = SimpleImputationAnalyzer(config)
suggestions = analyzer.analyze_dataframe(df, metadata)

Industry-Specific Examples

E-commerce Analytics

metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=13, max_value=120),
    ColumnMetadata('purchase_amount', 'float', min_value=0),
    ColumnMetadata('customer_segment', 'categorical', allowed_values='Premium,Standard,Basic'),
]
suggestions = funputer.analyze_dataframe(customer_df, metadata)

Healthcare Data

metadata = [
    ColumnMetadata('patient_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=0, max_value=150),
    ColumnMetadata('blood_pressure', 'integer', min_value=50, max_value=300),
    ColumnMetadata('diagnosis', 'categorical', nullable=False),
]
config = AnalysisConfig(missing_threshold=0.05)  # Low tolerance for healthcare
suggestions = funputer.analyze_dataframe(patient_df, metadata, config)

Financial Risk Assessment

metadata = [
    ColumnMetadata('application_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('credit_score', 'integer', min_value=300, max_value=850),
    ColumnMetadata('debt_to_income', 'float', min_value=0.0, max_value=10.0),
    ColumnMetadata('loan_purpose', 'categorical', allowed_values='home,auto,personal,business'),
]
# Skip sensitive columns
config = AnalysisConfig(skip_columns=['ssn', 'account_number'])
suggestions = funputer.analyze_dataframe(loan_df, metadata, config)

⚙️ Requirements

  • Python: 3.9 or higher
  • Dependencies: pandas, numpy, scipy, pydantic, click, pyyaml

🔧 Installation from Source

git clone https://github.com/RajeshRamachander/funputer.git
cd funputer
pip install -e .

📚 Documentation

  • API Reference: Complete docstrings and type hints throughout the codebase
  • Examples: See usage examples above and in the codebase
  • Test Coverage: 77% coverage with comprehensive test suite

🤝 Contributing

We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.

📄 License

MIT License - see LICENSE file for details.


Focus: Get intelligent imputation recommendations, not complex infrastructure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

funputer-1.3.7.tar.gz (77.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

funputer-1.3.7-py3-none-any.whl (44.8 kB view details)

Uploaded Python 3

File details

Details for the file funputer-1.3.7.tar.gz.

File metadata

  • Download URL: funputer-1.3.7.tar.gz
  • Upload date:
  • Size: 77.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for funputer-1.3.7.tar.gz
Algorithm Hash digest
SHA256 c87443c7671135acced3cd5d927b59b6a44e7c27b82b949c03357686fcf8c26f
MD5 a2c2b7eeb6dda4a9f9e6177202393095
BLAKE2b-256 07c164104d6cd655c939c3d6d0837896ed0f4309a905887cf28a8b10226672f2

See more details on using hashes here.

File details

Details for the file funputer-1.3.7-py3-none-any.whl.

File metadata

  • Download URL: funputer-1.3.7-py3-none-any.whl
  • Upload date:
  • Size: 44.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for funputer-1.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c23a879ce1b6e84cdcb889c9f3ed661c27c11c24be646a85663bca7a5e17aab6
MD5 be8f60c7b50d8891daa596f97fb552ab
BLAKE2b-256 96f80ff73935d5fb3db6859b5b2d04f425c622526a875cc9f8752dd109f3eebd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page