Skip to main content

NeuroSplit Boosting for tabular data with differentiable soft trees and neural gating.

Project description

catfishml

catfishml is a Python library for OmniBoost++: generalized boosting with adaptive routing across heterogeneous weak learners (linear, spline/GAM-like, adaptive-depth MLP, differentiable soft tree).

Why catfishml

  • Adaptive residual router with complexity/redundancy penalties.
  • Additive boosting in natural predictor space with Newton-style targets.
  • Numeric + categorical support (categorical embeddings).
  • Missing value handling with SimpleImputer or MICE-style IterativeImputer.
  • Adaptive behavior:
    • automatic objective/distribution and metric selection,
    • linearity probe (auto linear vs nonlinear mode),
    • adaptive MLP depth + adaptive tree depth.
  • CPU/GPU via PyTorch.
  • Automatic dependency install for missing core libraries (can be disabled).

Install

pip install catfishml

For development:

pip install -e .[dev]

Quick start

import pandas as pd
from catfishml import FishyCatClassifier

X = pd.DataFrame(
    {
        "age": [25, 31, 45, None, 39, 22, 55],
        "income": [2200, 3400, 7600, 5100, None, 1900, 8800],
        "city": ["A", "B", "A", "C", "B", None, "A"],
    }
)
y = [0, 0, 1, 1, 0, 0, 1]

model = FishyCatClassifier(
    n_estimators=40,
    tree_depth=3,
    metrics="auto",
    auto_metric=True,
    impute_strategy="auto",
    candidate_families="auto",
    install_missing_libraries=True,
    n_jobs=4,
    verbose=1,
)

model.fit(X, y)
print(model.evaluate(X, y))
print(model.predict_proba(X)[:3])
fig = model.plot_visualization(kind="overview")
print(model.get_statistics())
print(model.get_history(as_dataframe=True).head())

Main API

  • FishyCatBooster
  • FishyCatClassifier

For regression, use FishyCatBooster(task="regression", ...).

Common parameters:

  • metrics: metric name ("auto", "accuracy", "auc", "logloss", "rmse", "mae", "r2") or callable.
  • auto_metric: if True, metric and training validation feedback are auto-selected by task/data.
  • impute_strategy: "auto", "simple", "iterative", or "none".
  • structure_mode: "auto", "linear", "nonlinear".
  • boosting_order: 1 (gradient) or 2 (Newton-like weighted residuals).
  • candidate_families: "auto" or subset of ["linear", "spline", "adaptive_mlp", "soft_tree"].
  • plot_visualization(kind=...): loss/routing/depth/overview diagnostics.
  • get_statistics(): full training + data summary.
  • get_history(as_dataframe=True): per-iteration history (loss, metric, ETA, routing).
  • view_data(X, transformed=True/False): inspect raw or transformed data.
  • auto_install_dependencies: auto-installs missing libs using pip at runtime.
  • install_plot_dependencies: if True, auto-installs plotting dependencies too.
  • full_report(X, y): one-shot report (statistics + history + evaluation).
  • available_components(): list of all integrated learner families/features.

Notes

  • This repository provides a practical implementation of OmniBoost++ ideas; it is not a strict reproduction of a specific paper.
  • For larger datasets, run on GPU: device="cuda".

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catfishml-0.4.0.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

catfishml-0.4.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file catfishml-0.4.0.tar.gz.

File metadata

  • Download URL: catfishml-0.4.0.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for catfishml-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d522de639a681571d55f491084b918be6a14154976ba97b676f4a27da3b6774d
MD5 0d0e4418cba94916dcd0c16c5e13c6a2
BLAKE2b-256 eb60895d97fcd4831e364a0d465999e6ea2ad08ddbf707b3857be2bdb610e7cf

See more details on using hashes here.

File details

Details for the file catfishml-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: catfishml-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for catfishml-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0100b8de577ce11386fe2fec8eb2446a8524861cd6bf2426091b9351bf4b1fc3
MD5 476ba50547d20cb9fd74e4f91a3a139a
BLAKE2b-256 f8e5c1ce18a9a26954736b8306630e9af876837f18076f004907a6b770f9cd32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page