Intelligent imputation analysis with automatic data validation, metadata inference, and percentile-based outlier-resistant ranges

These details have not been verified by PyPI

Project description

FunPuter - Intelligent Imputation Analysis

[ License: Proprietary ]

Intelligent imputation analysis with automatic data validation and metadata inference

FunPuter analyzes your data and recommends the best imputation methods based on data patterns, missing mechanisms, and metadata constraints. Get intelligent suggestions with confidence scores to handle missing data professionally.

🚀 Quick Start

Installation

pip install funputer

30-Second Example

Auto-Inference Mode (Zero Configuration)

import funputer

# Point to your CSV - FunPuter figures out everything automatically
suggestions = funputer.analyze_imputation_requirements("your_data.csv")

# Get intelligent suggestions with confidence scores
for suggestion in suggestions:
    if suggestion.missing_count > 0:
        print(f"📊 {suggestion.column_name}: {suggestion.proposed_method}")
        print(f"   Confidence: {suggestion.confidence_score:.3f}")
        print(f"   Reason: {suggestion.rationale}")
        print(f"   Missing: {suggestion.missing_count} ({suggestion.missing_percentage:.1f}%)")

Production Mode (Full Control)

import funputer
from funputer.models import ColumnMetadata

# Define your data structure with constraints
metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=18, max_value=100),
    ColumnMetadata('income', 'float', min_value=0),
    ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),
]

# Get production-grade suggestions
suggestions = funputer.analyze_dataframe(your_dataframe, metadata)

🎯 Key Features

🤖 Automatic Metadata Inference - Intelligent data type and constraint detection
📊 Missing Data Analysis - MCAR, MAR, MNAR mechanism detection
⚡ Data Validation - Real-time constraint checking and validation
🎯 Smart Recommendations - Context-aware imputation method suggestions
📈 Confidence Scoring - Transparent reliability estimates for each recommendation
🛡️ Pre-flight Checks - Comprehensive data validation before analysis
📊 Percentile-Based Ranges - Outlier-resistant numeric bounds using configurable percentiles
💻 CLI & Python API - Flexible usage via command line or programmatic access

📊 Data Validation System

Comprehensive validation runs automatically to prevent crashes and guide your workflow:

File validation: Format detection, encoding, accessibility
Structure validation: Column analysis, data type inference
Memory estimation: Resource usage prediction
Advisory recommendations: Guided workflow suggestions

Independent Usage:

# Basic validation check
funputer preflight -d your_data.csv

# With custom options  
funputer preflight -d data.csv --sample-rows 5000 --encoding utf-8

# JSON report output
funputer preflight -d data.csv --json-out report.json

Exit Codes:

0: Ready for analysis
2: OK with warnings (can proceed)
10: Hard error (cannot proceed)

📊 Percentile-Based Range Detection

NEW in v1.6.0: Intelligent outlier-resistant numeric bounds using configurable percentiles.

Instead of using absolute min/max values that include outliers, FunPuter can calculate percentile-based ranges that provide more realistic business constraints.

How it works:

95th percentile (default): Excludes top/bottom 2.5% as outliers
99th percentile: More conservative, excludes top/bottom 0.5% as outliers
Requires 20+ samples for statistical reliability (configurable)
Falls back to traditional min/max when insufficient data

Example Benefits:

# Traditional bounds (includes outliers)
age_column: min=5, max=150  # Includes data entry errors

# Percentile bounds (outlier-resistant) 
age_column: percentile_low=18.2, percentile_high=65.8  # Realistic business range

Usage:

# Enable percentile ranges with default 95% threshold
from funputer import analyze_with_percentile_ranges
suggestions = analyze_with_percentile_ranges("data.csv")

# Custom percentile threshold (99% = more conservative)
suggestions = analyze_with_percentile_ranges("data.csv", percentile_threshold=99.0)

# With configuration object
from funputer.models import AnalysisConfig
config = AnalysisConfig(
    enable_percentile_ranges=True,
    default_percentile_threshold=90.0,
    min_samples_for_percentiles=15
)
suggestions = funputer.analyze_dataframe(df, config=config)

CLI Usage:

# Enable percentile ranges (95% default)
funputer analyze -d data.csv --percentile-threshold 95.0

# More conservative outlier detection (99%)
funputer analyze -d data.csv --percentile-threshold 99.0

# Disable percentile ranges (traditional min/max only)
funputer analyze -d data.csv --disable-percentile-ranges

# Custom minimum samples requirement
funputer analyze -d data.csv --min-samples-percentiles 25

💻 Command Line Interface

# Generate metadata template from your data
funputer init -d data.csv -o metadata.csv

# Analyze with auto-inference  
funputer analyze -d data.csv

# Analyze with custom metadata
funputer analyze -d data.csv -m metadata.csv --verbose

# Analyze with percentile-based ranges (NEW)
funputer analyze -d data.csv --percentile-threshold 95.0

# Data quality check first
funputer preflight -d data.csv

📚 Usage Examples

Basic Analysis

import funputer

# Simple analysis with auto-inference
suggestions = funputer.analyze_imputation_requirements("sales_data.csv")

# Display recommendations
for suggestion in suggestions:
    print(f"Column: {suggestion.column_name}")
    print(f"Method: {suggestion.proposed_method}")  
    print(f"Confidence: {suggestion.confidence_score:.3f}")
    print(f"Missing: {suggestion.missing_count} values")
    print()

Percentile-Based Range Analysis (NEW)

import funputer

# Outlier-resistant analysis with percentile ranges
suggestions = funputer.analyze_with_percentile_ranges("customer_data.csv")

# Access both traditional and percentile bounds
from funputer.metadata_inference import infer_metadata_from_dataframe
from funputer.models import AnalysisConfig
import pandas as pd

df = pd.read_csv("customer_data.csv")
config = AnalysisConfig(enable_percentile_ranges=True, default_percentile_threshold=95.0)
metadata = infer_metadata_from_dataframe(df, config=config)

for meta in metadata:
    if meta.data_type in ['integer', 'float']:
        print(f"\n{meta.column_name}:")
        print(f"  Traditional bounds: {meta.min_value} - {meta.max_value}")
        if meta.percentile_low is not None:
            print(f"  Percentile bounds:  {meta.percentile_low:.1f} - {meta.percentile_high:.1f} ({meta.percentile_threshold}%)")
            print(f"  Outlier exclusion:  {((meta.max_value - meta.min_value) - (meta.percentile_high - meta.percentile_low)) / (meta.max_value - meta.min_value) * 100:.1f}% of range")
        else:
            print(f"  Percentile bounds:  Not available (insufficient samples)")

Advanced Configuration

from funputer.models import ColumnMetadata, AnalysisConfig
from funputer.analyzer import ImputationAnalyzer

# Custom metadata with business rules
metadata = [
    ColumnMetadata('product_id', 'string', unique_flag=True, max_length=10),
    ColumnMetadata('price', 'float', min_value=0, max_value=10000),
    ColumnMetadata('category', 'categorical', allowed_values='Electronics,Books,Clothing'),
    ColumnMetadata('rating', 'float', min_value=1.0, max_value=5.0),
]

# Custom analysis configuration
config = AnalysisConfig(
    missing_percentage_threshold=0.3,  # 30% threshold
    skip_columns=['internal_id'],
    outlier_threshold=0.1
)

# Run analysis
analyzer = ImputationAnalyzer(config)
suggestions = analyzer.analyze_dataframe(df, metadata)

Industry-Specific Examples

E-commerce Analytics

metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=13, max_value=120),
    ColumnMetadata('purchase_amount', 'float', min_value=0),
    ColumnMetadata('customer_segment', 'categorical', allowed_values='Premium,Standard,Basic'),
]
suggestions = funputer.analyze_dataframe(customer_df, metadata)

Healthcare Data

metadata = [
    ColumnMetadata('patient_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=0, max_value=150),
    ColumnMetadata('blood_pressure', 'integer', min_value=50, max_value=300),
    ColumnMetadata('diagnosis', 'categorical', nullable=False),
]
config = AnalysisConfig(missing_threshold=0.05)  # Low tolerance for healthcare
suggestions = funputer.analyze_dataframe(patient_df, metadata, config)

Financial Risk Assessment

metadata = [
    ColumnMetadata('application_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('credit_score', 'integer', min_value=300, max_value=850),
    ColumnMetadata('debt_to_income', 'float', min_value=0.0, max_value=10.0),
    ColumnMetadata('loan_purpose', 'categorical', allowed_values='home,auto,personal,business'),
]
# Skip sensitive columns
config = AnalysisConfig(skip_columns=['ssn', 'account_number'])
suggestions = funputer.analyze_dataframe(loan_df, metadata, config)

⚙️ Requirements

Python: 3.9 or higher
Dependencies: pandas, numpy, scipy, pydantic, click, pyyaml

🔧 Installation from Source

git clone https://github.com/RajeshRamachander/funputer.git
cd funputer
pip install -e .

📚 Documentation

API Reference: Complete docstrings and type hints throughout the codebase
Examples: See usage examples above and in the codebase
Test Coverage: 84% coverage with comprehensive test suite

📄 License

Proprietary License - Source code is available for inspection but not for derivative works.

Focus: Get intelligent imputation recommendations, not complex infrastructure.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.7.1

Dec 16, 2025

1.7.0

Nov 4, 2025

This version

1.6.0

Nov 3, 2025

1.5.2

Aug 19, 2025

1.5.1

Aug 13, 2025

1.4.0

Aug 9, 2025

1.3.7

Aug 9, 2025

1.3.6

Aug 9, 2025

1.3.5

Aug 8, 2025

1.3.4

Aug 8, 2025

1.3.3

Aug 7, 2025

1.3.2

Aug 7, 2025

1.3.1

Aug 6, 2025

1.2.1

Aug 6, 2025

1.1.0

Aug 5, 2025

1.0.4

Aug 5, 2025

1.0.3

Aug 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

funputer-1.6.0.tar.gz (112.8 kB view details)

Uploaded Nov 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

funputer-1.6.0-py3-none-any.whl (39.2 kB view details)

Uploaded Nov 3, 2025 Python 3

File details

Details for the file funputer-1.6.0.tar.gz.

File metadata

Download URL: funputer-1.6.0.tar.gz
Upload date: Nov 3, 2025
Size: 112.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for funputer-1.6.0.tar.gz
Algorithm	Hash digest
SHA256	`1e5558c2b3c033d23ab031d5ed4fff4abd1704fd6fe95439decee982a0d488e1`
MD5	`e18c4927ee2152e99cd393e460b76a8f`
BLAKE2b-256	`70c3f0448a5cd0451184b394cde98f38ee134edb59c693c1f6369d8301d6f171`

See more details on using hashes here.

File details

Details for the file funputer-1.6.0-py3-none-any.whl.

File metadata

Download URL: funputer-1.6.0-py3-none-any.whl
Upload date: Nov 3, 2025
Size: 39.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for funputer-1.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cdfa02a00f6a1dfff9bf8d3edf7b6beff95c05161cc0a89479de9905cb9a676e`
MD5	`1492437f363bbb9b9a0c74e0f3f90940`
BLAKE2b-256	`b5183c7a3eb5dd9449a5e2b18fe2bcb2aee2f988d2cab6cbaebcb2869289f201`

See more details on using hashes here.

funputer 1.6.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Meta

Classifiers

Project description

FunPuter - Intelligent Imputation Analysis

🚀 Quick Start

Installation

30-Second Example

🎯 Key Features

📊 Data Validation System

📊 Percentile-Based Range Detection

💻 Command Line Interface

📚 Usage Examples

Basic Analysis

Percentile-Based Range Analysis (NEW)

Advanced Configuration

Industry-Specific Examples

⚙️ Requirements

🔧 Installation from Source

📚 Documentation

📄 License

Project details

Verified details

Project links

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes