Portable TabPFN evaluation pipeline with baselines, artifacts, reports, CLI, Python API, and MCP tools.

These details have not been verified by PyPI

Project description

ev-tabpfn

ev-tabpfn is a comprehensive evaluation pipeline for TabPFN and other tabular machine learning baselines. It provides a structured way to run, track, and aggregate machine learning experiments on tabular datasets.

This package was designed to facilitate rigorous comparison between TabPFN and industry-standard models like AutoGluon, CatBoost, XGBoost, and LightGBM.

Key Features

Standardized Evaluation: Consistent train/test splits and metric reporting across all models.
Rich Baselines: Built-in support for AutoGluon, CatBoost, XGBoost, LightGBM, Random Forest, and Logistic Regression.
Batch Orchestration: Run experiments across dozens of datasets with a single JSON configuration.
Automated Reporting: Generates ROC curves, radar plots, and summary Markdown reports.
Artifact Management: Structured output directory for logs, predictions, metrics, and models.
CLI & Python API: Use it as a command-line tool or integrate it into your Python scripts.

Installation

pip install ev-tabpfn

Requirements

Python 3.10+
Recommended: A fresh Conda environment (Python 3.11 is preferred for best compatibility with AutoGluon).

Quick Start

1. Set your TabPFN Token

To use the latest TabPFN models, you need a token from TabPFN.

export TABPFN_TOKEN="your_actual_tabpfn_token"

2. Run a Single Dataset Evaluation

Evaluate a single CSV file. Use --preset smoke first if you want the fastest sanity check:

ev-tabpfn run-single --dataset my_data.csv --target target_column --output ./outputs --preset smoke

--output is the output folder. The evaluator creates runs/, predictions/, metrics/, plots/, metadata/, and logs/ inside it.

3. Run a Batch Evaluation

Run multiple datasets as defined in a configuration file:

ev-tabpfn run --config config.json

4. Use Bundled Sample Datasets

The package includes compact smoke-test samples for binary classification, multiclass classification, and regression.

ev-tabpfn list-samples
ev-tabpfn copy-samples --output ./ev_tabpfn_samples

Create a runnable sample config and execute it:

ev-tabpfn make-sample-config \
  --samples-dir ./ev_tabpfn_samples \
  --output sample_config.json \
  --preset smoke

ev-tabpfn run --config sample_config.json

Required CSV Formats

The evaluator currently supports single-target tabular CSVs.

Rules:

One row equals one sample.
One column must be the target.
If --target / target_column is omitted, the final CSV column is used as the target.
Feature columns may be numeric or categorical.
Missing values are handled by baseline preprocessing where supported.
Multi-output regression and multilabel classification are not currently supported.

Inspect supported formats from the CLI:

ev-tabpfn data-formats
ev-tabpfn data-formats --task binary
ev-tabpfn data-formats --task multiclass
ev-tabpfn data-formats --task regression

Create CSV templates:

ev-tabpfn make-template --task binary --output binary_template.csv
ev-tabpfn make-template --task multiclass --output multiclass_template.csv
ev-tabpfn make-template --task regression --output regression_template.csv

Binary Classification CSV

Required shape:

feature_1,feature_2,...,target
value,value,...,class_a
value,value,...,class_b

Target requirements:

exactly two unique classes
labels may be 0/1, 1/2, yes/no, bad/good, or other string labels

Multiclass Classification CSV

Required shape:

feature_1,feature_2,...,target
value,value,...,class_a
value,value,...,class_b
value,value,...,class_c

Target requirements:

three or more discrete classes
labels may be strings or integer-like values

Regression CSV

Required shape:

feature_1,feature_2,...,target
value,value,...,1.23
value,value,...,4.56

Target requirements:

one numeric continuous target column
single-output regression only

Minimal Config Generation

For your own CSV, generate a runnable config instead of writing JSON by hand:

ev-tabpfn validate --dataset my_data.csv --target label

ev-tabpfn make-config \
  --dataset my_data.csv \
  --target label \
  --task binary \
  --preset smoke \
  --output-root ./outputs \
  --output my_config.json

ev-tabpfn run --config my_config.json

For make-config, --output is the config file path and --output-root is the evaluation output folder.

Model presets:

ev-tabpfn presets

smoke: fastest local check, sklearn baselines only
standard: GBM/sklearn baselines, no TabPFN or AutoGluon
full: TabPFN, AutoGluon, GBMs, and sklearn baselines

Configuration File Structure

The batch evaluation uses a JSON configuration file. Example:

{
  "run_name": "my_experiment",
  "output_root": "./results",
  "seed": 42,
  "run_reports": true,
  "aggregate_after_run": true,
  "models": {
    "tabpfn": {"enabled": true},
    "autogluon": {"enabled": true, "presets": "medium_quality", "time_limit": 60},
    "catboost": {"enabled": true},
    "xgboost": {"enabled": true},
    "lightgbm": {"enabled": true},
    "random_forest": {"enabled": true},
    "logistic_regression": {"enabled": true}
  },
  "datasets": [
    {
      "name": "dataset1",
      "path": "data/dataset1.csv"
    },
    {
      "name": "dataset2",
      "path": "data/dataset2.csv"
    }
  ]
}

Recreating Research Experiments

To recreate the experiments from the original research (e.g., standard classification datasets), follow these steps:

Prepare your environment:

conda create -n ev-tabpfn-test python=3.11 -y
conda activate ev-tabpfn-test
pip install ev-tabpfn

Create a configuration file (e.g., recreate_benchmark.json) and list your dataset paths.

Run the batch evaluation:

ev-tabpfn run --config recreate_benchmark.json

Inspect the results: Aggregated results will be available in the results/ directory under your output_root, including:
- aggregate_classification.md: Comprehensive metric comparison.
- benchmark_roc_grid.png: ROC curves for all datasets.
- benchmark_summary.md: High-level summary of model performance.

Python API

You can also use ev-tabpfn programmatically in your Python scripts:

from ev_tabpfn import (
    aggregate_results,
    create_config_template,
    describe_data_formats,
    evaluate_batch,
    evaluate_dataset,
    list_sample_datasets,
)

# Learn required CSV structures
print(describe_data_formats())

# Evaluate a single dataset
evaluate_dataset(
    dataset_path="data.csv",
    target_column="label",
    task="binary",
    output_root="./outputs",
    model_preset="smoke",
)

# Generate a reusable config
create_config_template(
    output_path="config.json",
    dataset_path="data.csv",
    target_column="label",
    task="binary",
    model_preset="smoke",
)

# Run a batch from a config file
evaluate_batch(config_path="config.json")

# Aggregate results from multiple runs
aggregate_results(output_root="./outputs")

# Inspect bundled samples
list_sample_datasets()

CLI Reference

ev-tabpfn run: Run a batch evaluation from a JSON config.
ev-tabpfn run-single: Run evaluation on a single dataset.
ev-tabpfn aggregate: Aggregate existing run results into a summary report.
ev-tabpfn validate: Validate dataset format and compatibility.
ev-tabpfn summarize-run: Print a human-readable summary of a specific dataset run.
ev-tabpfn generate-report: Generate visual plots and reports for a run.
ev-tabpfn list-samples: List bundled smoke-test datasets.
ev-tabpfn copy-samples: Copy bundled sample CSVs into a working folder.
ev-tabpfn sample-path: Print the installed path for one bundled sample.
ev-tabpfn data-formats: Describe required CSV structures.
ev-tabpfn make-template: Create a CSV template for a task.
ev-tabpfn make-config: Create a runnable JSON config for one CSV.
ev-tabpfn make-sample-config: Create a runnable JSON config for bundled samples.
ev-tabpfn presets: List model presets.

PyPI README

This file is the package long description via pyproject.toml:

readme = "README.md"

The next PyPI release page will use this README after rebuilding and uploading the next version.

Output Directory Structure

Each run produces a structured output:

output_root/
├── batch_config.resolved.json  # The final config used
├── batch_manifest.json         # Index of all runs
├── results/                    # Aggregated plots and tables
├── summary/                    # High-level JSON summaries
├── logs/                       # Batch-level logs
└── runs/                       # Individual dataset results
    └── <dataset_name>/
        └── <run_id>/
            ├── predictions/    # CSV predictions per model
            ├── metrics/        # Performance metrics
            ├── plots/          # ROC and PR curves
            └── logs/           # Detailed execution logs

License

See LICENSE. Replace the current local placeholder with the final project license before publishing a production release.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 5, 2026

0.1.0

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ev_tabpfn-0.1.1.tar.gz (40.5 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ev_tabpfn-0.1.1-py3-none-any.whl (46.2 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file ev_tabpfn-0.1.1.tar.gz.

File metadata

Download URL: ev_tabpfn-0.1.1.tar.gz
Upload date: May 5, 2026
Size: 40.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ev_tabpfn-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f754bc56a408848c149eba42a29744f821aa432d3f5b3ba0a2c2c53ec29b7e96`
MD5	`17c886dbb86084f337a328360032d9c5`
BLAKE2b-256	`0c93b92e60ac95479db9c64b618d425a0adab9e66a6cd0d2c8e0179b64d73e38`

See more details on using hashes here.

File details

Details for the file ev_tabpfn-0.1.1-py3-none-any.whl.

File metadata

Download URL: ev_tabpfn-0.1.1-py3-none-any.whl
Upload date: May 5, 2026
Size: 46.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ev_tabpfn-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9018b684850885adc86bfd737f9dfdf90fd326530d76e5e63aaa14a33f462205`
MD5	`8c143996ea7d8a85600473d9efeec1b5`
BLAKE2b-256	`2f1f0abc2074303550d29a6d7dcfd800a14325bce1d2314c58f8647e661e296f`

See more details on using hashes here.

ev-tabpfn 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ev-tabpfn

Key Features

Installation

Requirements

Quick Start

1. Set your TabPFN Token

2. Run a Single Dataset Evaluation

3. Run a Batch Evaluation

4. Use Bundled Sample Datasets

Required CSV Formats

Binary Classification CSV

Multiclass Classification CSV

Regression CSV

Minimal Config Generation

Configuration File Structure

Recreating Research Experiments

Python API

CLI Reference

PyPI README

Output Directory Structure

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes