Skip to main content

Portable TabPFN evaluation pipeline with baselines, artifacts, reports, CLI, Python API, and MCP tools.

Project description

ev-tabpfn

ev-tabpfn is a comprehensive evaluation pipeline for TabPFN and other tabular machine learning baselines. It provides a structured way to run, track, and aggregate machine learning experiments on tabular datasets.

This package was designed to facilitate rigorous comparison between TabPFN and industry-standard models like AutoGluon, CatBoost, XGBoost, and LightGBM.

Key Features

  • Standardized Evaluation: Consistent train/test splits and metric reporting across all models.
  • Rich Baselines: Built-in support for AutoGluon, CatBoost, XGBoost, LightGBM, Random Forest, and Logistic Regression.
  • Batch Orchestration: Run experiments across dozens of datasets with a single JSON configuration.
  • Automated Reporting: Generates ROC curves, radar plots, and summary Markdown reports.
  • Artifact Management: Structured output directory for logs, predictions, metrics, and models.
  • CLI & Python API: Use it as a command-line tool or integrate it into your Python scripts.

Installation

pip install ev-tabpfn

Requirements

  • Python 3.10+
  • Recommended: A fresh Conda environment (Python 3.11 is preferred for best compatibility with AutoGluon).

Quick Start

1. Set your TabPFN Token

To use the latest TabPFN models, you need a token from TabPFN.

export TABPFN_TOKEN="your_actual_tabpfn_token"

2. Run a Single Dataset Evaluation

Evaluate a single CSV file. Use --preset smoke first if you want the fastest sanity check:

ev-tabpfn run-single --dataset my_data.csv --target target_column --output ./outputs --preset smoke

--output is the output folder. The evaluator creates runs/, predictions/, metrics/, plots/, metadata/, and logs/ inside it.

3. Run a Batch Evaluation

Run multiple datasets as defined in a configuration file:

ev-tabpfn run --config config.json

4. Use Bundled Sample Datasets

The package includes compact smoke-test samples for binary classification, multiclass classification, and regression.

ev-tabpfn list-samples
ev-tabpfn copy-samples --output ./ev_tabpfn_samples

Create a runnable sample config and execute it:

ev-tabpfn make-sample-config \
  --samples-dir ./ev_tabpfn_samples \
  --output sample_config.json \
  --preset smoke

ev-tabpfn run --config sample_config.json

Required CSV Formats

The evaluator currently supports single-target tabular CSVs.

Rules:

  • One row equals one sample.
  • One column must be the target.
  • If --target / target_column is omitted, the final CSV column is used as the target.
  • Feature columns may be numeric or categorical.
  • Missing values are handled by baseline preprocessing where supported.
  • Multi-output regression and multilabel classification are not currently supported.

Inspect supported formats from the CLI:

ev-tabpfn data-formats
ev-tabpfn data-formats --task binary
ev-tabpfn data-formats --task multiclass
ev-tabpfn data-formats --task regression

Create CSV templates:

ev-tabpfn make-template --task binary --output binary_template.csv
ev-tabpfn make-template --task multiclass --output multiclass_template.csv
ev-tabpfn make-template --task regression --output regression_template.csv

Binary Classification CSV

Required shape:

feature_1,feature_2,...,target
value,value,...,class_a
value,value,...,class_b

Target requirements:

  • exactly two unique classes
  • labels may be 0/1, 1/2, yes/no, bad/good, or other string labels

Multiclass Classification CSV

Required shape:

feature_1,feature_2,...,target
value,value,...,class_a
value,value,...,class_b
value,value,...,class_c

Target requirements:

  • three or more discrete classes
  • labels may be strings or integer-like values

Regression CSV

Required shape:

feature_1,feature_2,...,target
value,value,...,1.23
value,value,...,4.56

Target requirements:

  • one numeric continuous target column
  • single-output regression only

Minimal Config Generation

For your own CSV, generate a runnable config instead of writing JSON by hand:

ev-tabpfn validate --dataset my_data.csv --target label

ev-tabpfn make-config \
  --dataset my_data.csv \
  --target label \
  --task binary \
  --preset smoke \
  --output-root ./outputs \
  --output my_config.json

ev-tabpfn run --config my_config.json

For make-config, --output is the config file path and --output-root is the evaluation output folder.

Model presets:

ev-tabpfn presets
  • smoke: fastest local check, sklearn baselines only
  • standard: GBM/sklearn baselines, no TabPFN or AutoGluon
  • full: TabPFN, AutoGluon, GBMs, and sklearn baselines

Configuration File Structure

The batch evaluation uses a JSON configuration file. Example:

{
  "run_name": "my_experiment",
  "output_root": "./results",
  "seed": 42,
  "run_reports": true,
  "aggregate_after_run": true,
  "models": {
    "tabpfn": {"enabled": true},
    "autogluon": {"enabled": true, "presets": "medium_quality", "time_limit": 60},
    "catboost": {"enabled": true},
    "xgboost": {"enabled": true},
    "lightgbm": {"enabled": true},
    "random_forest": {"enabled": true},
    "logistic_regression": {"enabled": true}
  },
  "datasets": [
    {
      "name": "dataset1",
      "path": "data/dataset1.csv"
    },
    {
      "name": "dataset2",
      "path": "data/dataset2.csv"
    }
  ]
}

Recreating Research Experiments

To recreate the experiments from the original research (e.g., standard classification datasets), follow these steps:

  1. Prepare your environment:

    conda create -n ev-tabpfn-test python=3.11 -y
    conda activate ev-tabpfn-test
    pip install ev-tabpfn
    
  2. Create a configuration file (e.g., recreate_benchmark.json) and list your dataset paths.

  3. Run the batch evaluation:

    ev-tabpfn run --config recreate_benchmark.json
    
  4. Inspect the results: Aggregated results will be available in the results/ directory under your output_root, including:

    • aggregate_classification.md: Comprehensive metric comparison.
    • benchmark_roc_grid.png: ROC curves for all datasets.
    • benchmark_summary.md: High-level summary of model performance.

Python API

You can also use ev-tabpfn programmatically in your Python scripts:

from ev_tabpfn import (
    aggregate_results,
    create_config_template,
    describe_data_formats,
    evaluate_batch,
    evaluate_dataset,
    list_sample_datasets,
)

# Learn required CSV structures
print(describe_data_formats())

# Evaluate a single dataset
evaluate_dataset(
    dataset_path="data.csv",
    target_column="label",
    task="binary",
    output_root="./outputs",
    model_preset="smoke",
)

# Generate a reusable config
create_config_template(
    output_path="config.json",
    dataset_path="data.csv",
    target_column="label",
    task="binary",
    model_preset="smoke",
)

# Run a batch from a config file
evaluate_batch(config_path="config.json")

# Aggregate results from multiple runs
aggregate_results(output_root="./outputs")

# Inspect bundled samples
list_sample_datasets()

CLI Reference

  • ev-tabpfn run: Run a batch evaluation from a JSON config.
  • ev-tabpfn run-single: Run evaluation on a single dataset.
  • ev-tabpfn aggregate: Aggregate existing run results into a summary report.
  • ev-tabpfn validate: Validate dataset format and compatibility.
  • ev-tabpfn summarize-run: Print a human-readable summary of a specific dataset run.
  • ev-tabpfn generate-report: Generate visual plots and reports for a run.
  • ev-tabpfn list-samples: List bundled smoke-test datasets.
  • ev-tabpfn copy-samples: Copy bundled sample CSVs into a working folder.
  • ev-tabpfn sample-path: Print the installed path for one bundled sample.
  • ev-tabpfn data-formats: Describe required CSV structures.
  • ev-tabpfn make-template: Create a CSV template for a task.
  • ev-tabpfn make-config: Create a runnable JSON config for one CSV.
  • ev-tabpfn make-sample-config: Create a runnable JSON config for bundled samples.
  • ev-tabpfn presets: List model presets.

PyPI README

This file is the package long description via pyproject.toml:

readme = "README.md"

The next PyPI release page will use this README after rebuilding and uploading the next version.

Output Directory Structure

Each run produces a structured output:

output_root/
├── batch_config.resolved.json  # The final config used
├── batch_manifest.json         # Index of all runs
├── results/                    # Aggregated plots and tables
├── summary/                    # High-level JSON summaries
├── logs/                       # Batch-level logs
└── runs/                       # Individual dataset results
    └── <dataset_name>/
        └── <run_id>/
            ├── predictions/    # CSV predictions per model
            ├── metrics/        # Performance metrics
            ├── plots/          # ROC and PR curves
            └── logs/           # Detailed execution logs

License

See LICENSE. Replace the current local placeholder with the final project license before publishing a production release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ev_tabpfn-0.1.1.tar.gz (40.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ev_tabpfn-0.1.1-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file ev_tabpfn-0.1.1.tar.gz.

File metadata

  • Download URL: ev_tabpfn-0.1.1.tar.gz
  • Upload date:
  • Size: 40.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ev_tabpfn-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f754bc56a408848c149eba42a29744f821aa432d3f5b3ba0a2c2c53ec29b7e96
MD5 17c886dbb86084f337a328360032d9c5
BLAKE2b-256 0c93b92e60ac95479db9c64b618d425a0adab9e66a6cd0d2c8e0179b64d73e38

See more details on using hashes here.

File details

Details for the file ev_tabpfn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ev_tabpfn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ev_tabpfn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9018b684850885adc86bfd737f9dfdf90fd326530d76e5e63aaa14a33f462205
MD5 8c143996ea7d8a85600473d9efeec1b5
BLAKE2b-256 2f1f0abc2074303550d29a6d7dcfd800a14325bce1d2314c58f8647e661e296f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page