Skip to main content

Phil: representation-guided multiverse imputation with Rust-backed ECT.

Project description

Phil

Phil is a representation-guided imputation library for missing tabular data.

It generates multiple imputations using a configurable strategy grid, computes Euler Characteristic Transform (ECT) descriptors over each imputed dataset, and selects the most representative imputation from the candidate set.

Installation

pip install phil

phil requires the trailed backend for ECT computation. Install it from the KRV research index or provide a compatible local build.

What Phil Does

  1. Impute — runs a grid of imputation strategies (sklearn estimators or custom) over the input dataframe, producing a set of candidate datasets
  2. Describe — computes an ECT descriptor for each candidate via the trailed backend
  3. Select — picks the candidate closest to the mean descriptor (most representative imputation)
  4. Transform — exposes the fitted pipeline for inference on new data

Quick Start

import pandas as pd
from phil import Phil

df = pd.read_csv("data_with_missing.csv")

phil = Phil(samples=30, random_state=42)
imputed_df = phil.fit(df)

# Apply the same fitted pipeline to new data
new_df = phil.transform(new_data)

scikit-learn Pipeline Integration

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from phil import PhilTransformer

pipe = Pipeline([
    ("imputer", PhilTransformer(samples=20, random_state=0)),
    ("model", RandomForestClassifier()),
])
pipe.fit(X_train, y_train)

Configuration

Imputation grids

Phil ships with named grids accessible via GridGallery:

Name Methods
default BayesianRidge, DecisionTree, RandomForest, GradientBoosting
sampling DistributionImputer (empirical sampling)
finance IterativeImputer, KNNImputer, SimpleImputer
healthcare KNNImputer, SimpleImputer, IterativeImputer
marketing SimpleImputer, KNNImputer, IterativeImputer
engineering SimpleImputer, KNNImputer, IterativeImputer

Pass a grid name or an ImputationConfig directly:

from phil import Phil, ImputationConfig
from sklearn.model_selection import ParameterGrid

config = ImputationConfig(
    methods=["KNNImputer"],
    modules=["sklearn.impute"],
    grids=[ParameterGrid({"n_neighbors": [3, 5, 7]})],
)
phil = Phil(param_grid=config)

ECT descriptor

ECT is configured via ECTConfig:

from phil import Phil, ECTConfig

ect_config = ECTConfig(
    num_thetas=64,
    radius=1.0,
    resolution=100,
    scale=500,
    normalize=True,
    seed=42,
)
phil = Phil(config=ect_config)

Development

uv sync --all-extras
uv run pytest -v
uv run black phil/ tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

philler-1.0.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

philler-1.0.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file philler-1.0.0.tar.gz.

File metadata

  • Download URL: philler-1.0.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for philler-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6b3f8a32d923da043dcc91c873f60879090dcb681e641296c5c5e4e7c34ea53c
MD5 0b9815ebc9c22bb8d216682f55005ed7
BLAKE2b-256 4f12a62b493c6590363d80f229488f6c3a59460f9189729cf192465ef4c173b3

See more details on using hashes here.

File details

Details for the file philler-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: philler-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for philler-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d121b5c925e1ce562a01cd278db3e8be6b549aa9a622a4022e5b931594c1b862
MD5 3c1adebe9634eaf20ba6aa1930b19976
BLAKE2b-256 1d40a9f4471936db711170ca326b3b100acd32b2bd52fbb80f0c74e92f876064

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page