Portable TabPFN evaluation pipeline with baselines, artifacts, reports, CLI, Python API, and MCP tools.
Project description
ev-tabpfn
ev-tabpfn is a comprehensive evaluation pipeline for TabPFN and other tabular machine learning baselines. It provides a structured way to run, track, and aggregate machine learning experiments on tabular datasets.
This package was designed to facilitate rigorous comparison between TabPFN and industry-standard models like AutoGluon, CatBoost, XGBoost, and LightGBM.
Key Features
- Standardized Evaluation: Consistent train/test splits and metric reporting across all models.
- Rich Baselines: Built-in support for AutoGluon, CatBoost, XGBoost, LightGBM, Random Forest, and Logistic Regression.
- Batch Orchestration: Run experiments across dozens of datasets with a single JSON configuration.
- Automated Reporting: Generates ROC curves, radar plots, and summary Markdown reports.
- Artifact Management: Structured output directory for logs, predictions, metrics, and models.
- CLI & Python API: Use it as a command-line tool or integrate it into your Python scripts.
Installation
pip install ev-tabpfn
Requirements
- Python 3.10+
- Recommended: A fresh Conda environment (Python 3.11 is preferred for best compatibility with AutoGluon).
Quick Start
1. Set your TabPFN Token
To use the latest TabPFN models, you need a token from TabPFN.
export TABPFN_TOKEN="your_actual_tabpfn_token"
2. Run a Single Dataset Evaluation
Evaluate a single CSV file. Use --preset smoke first if you want the fastest sanity check:
ev-tabpfn run-single --dataset my_data.csv --target target_column --output ./outputs --preset smoke
--output is the output folder. The evaluator creates runs/, predictions/, metrics/, plots/, metadata/, and logs/ inside it.
3. Run a Batch Evaluation
Run multiple datasets as defined in a configuration file:
ev-tabpfn run --config config.json
4. Use Bundled Sample Datasets
The package includes compact smoke-test samples for binary classification, multiclass classification, and regression.
ev-tabpfn list-samples
ev-tabpfn copy-samples --output ./ev_tabpfn_samples
Create a runnable sample config and execute it:
ev-tabpfn make-sample-config \
--samples-dir ./ev_tabpfn_samples \
--output sample_config.json \
--preset smoke
ev-tabpfn run --config sample_config.json
Required CSV Formats
The evaluator currently supports single-target tabular CSVs.
Rules:
- One row equals one sample.
- One column must be the target.
- If
--target/target_columnis omitted, the final CSV column is used as the target. - Feature columns may be numeric or categorical.
- Missing values are handled by baseline preprocessing where supported.
- Multi-output regression and multilabel classification are not currently supported.
Inspect supported formats from the CLI:
ev-tabpfn data-formats
ev-tabpfn data-formats --task binary
ev-tabpfn data-formats --task multiclass
ev-tabpfn data-formats --task regression
Create CSV templates:
ev-tabpfn make-template --task binary --output binary_template.csv
ev-tabpfn make-template --task multiclass --output multiclass_template.csv
ev-tabpfn make-template --task regression --output regression_template.csv
Binary Classification CSV
Required shape:
feature_1,feature_2,...,target
value,value,...,class_a
value,value,...,class_b
Target requirements:
- exactly two unique classes
- labels may be
0/1,1/2,yes/no,bad/good, or other string labels
Multiclass Classification CSV
Required shape:
feature_1,feature_2,...,target
value,value,...,class_a
value,value,...,class_b
value,value,...,class_c
Target requirements:
- three or more discrete classes
- labels may be strings or integer-like values
Regression CSV
Required shape:
feature_1,feature_2,...,target
value,value,...,1.23
value,value,...,4.56
Target requirements:
- one numeric continuous target column
- single-output regression only
Minimal Config Generation
For your own CSV, generate a runnable config instead of writing JSON by hand:
ev-tabpfn validate --dataset my_data.csv --target label
ev-tabpfn make-config \
--dataset my_data.csv \
--target label \
--task binary \
--preset smoke \
--output-root ./outputs \
--output my_config.json
ev-tabpfn run --config my_config.json
For make-config, --output is the config file path and --output-root is the evaluation output folder.
Model presets:
ev-tabpfn presets
smoke: fastest local check, sklearn baselines onlystandard: GBM/sklearn baselines, no TabPFN or AutoGluonfull: TabPFN, AutoGluon, GBMs, and sklearn baselines
Configuration File Structure
The batch evaluation uses a JSON configuration file. Example:
{
"run_name": "my_experiment",
"output_root": "./results",
"seed": 42,
"run_reports": true,
"aggregate_after_run": true,
"models": {
"tabpfn": {"enabled": true},
"autogluon": {"enabled": true, "presets": "medium_quality", "time_limit": 60},
"catboost": {"enabled": true},
"xgboost": {"enabled": true},
"lightgbm": {"enabled": true},
"random_forest": {"enabled": true},
"logistic_regression": {"enabled": true}
},
"datasets": [
{
"name": "dataset1",
"path": "data/dataset1.csv"
},
{
"name": "dataset2",
"path": "data/dataset2.csv"
}
]
}
Recreating Research Experiments
To recreate the experiments from the original research (e.g., standard classification datasets), follow these steps:
-
Prepare your environment:
conda create -n ev-tabpfn-test python=3.11 -y conda activate ev-tabpfn-test pip install ev-tabpfn
-
Create a configuration file (e.g.,
recreate_benchmark.json) and list your dataset paths. -
Run the batch evaluation:
ev-tabpfn run --config recreate_benchmark.json
-
Inspect the results: Aggregated results will be available in the
results/directory under youroutput_root, including:aggregate_classification.md: Comprehensive metric comparison.benchmark_roc_grid.png: ROC curves for all datasets.benchmark_summary.md: High-level summary of model performance.
Python API
You can also use ev-tabpfn programmatically in your Python scripts:
from ev_tabpfn import (
aggregate_results,
create_config_template,
describe_data_formats,
evaluate_batch,
evaluate_dataset,
list_sample_datasets,
)
# Learn required CSV structures
print(describe_data_formats())
# Evaluate a single dataset
evaluate_dataset(
dataset_path="data.csv",
target_column="label",
task="binary",
output_root="./outputs",
model_preset="smoke",
)
# Generate a reusable config
create_config_template(
output_path="config.json",
dataset_path="data.csv",
target_column="label",
task="binary",
model_preset="smoke",
)
# Run a batch from a config file
evaluate_batch(config_path="config.json")
# Aggregate results from multiple runs
aggregate_results(output_root="./outputs")
# Inspect bundled samples
list_sample_datasets()
CLI Reference
ev-tabpfn run: Run a batch evaluation from a JSON config.ev-tabpfn run-single: Run evaluation on a single dataset.ev-tabpfn aggregate: Aggregate existing run results into a summary report.ev-tabpfn validate: Validate dataset format and compatibility.ev-tabpfn summarize-run: Print a human-readable summary of a specific dataset run.ev-tabpfn generate-report: Generate visual plots and reports for a run.ev-tabpfn list-samples: List bundled smoke-test datasets.ev-tabpfn copy-samples: Copy bundled sample CSVs into a working folder.ev-tabpfn sample-path: Print the installed path for one bundled sample.ev-tabpfn data-formats: Describe required CSV structures.ev-tabpfn make-template: Create a CSV template for a task.ev-tabpfn make-config: Create a runnable JSON config for one CSV.ev-tabpfn make-sample-config: Create a runnable JSON config for bundled samples.ev-tabpfn presets: List model presets.
PyPI README
This file is the package long description via pyproject.toml:
readme = "README.md"
The next PyPI release page will use this README after rebuilding and uploading the next version.
Output Directory Structure
Each run produces a structured output:
output_root/
├── batch_config.resolved.json # The final config used
├── batch_manifest.json # Index of all runs
├── results/ # Aggregated plots and tables
├── summary/ # High-level JSON summaries
├── logs/ # Batch-level logs
└── runs/ # Individual dataset results
└── <dataset_name>/
└── <run_id>/
├── predictions/ # CSV predictions per model
├── metrics/ # Performance metrics
├── plots/ # ROC and PR curves
└── logs/ # Detailed execution logs
License
See LICENSE. Replace the current local placeholder with the final project license before publishing a production release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ev_tabpfn-0.1.1.tar.gz.
File metadata
- Download URL: ev_tabpfn-0.1.1.tar.gz
- Upload date:
- Size: 40.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f754bc56a408848c149eba42a29744f821aa432d3f5b3ba0a2c2c53ec29b7e96
|
|
| MD5 |
17c886dbb86084f337a328360032d9c5
|
|
| BLAKE2b-256 |
0c93b92e60ac95479db9c64b618d425a0adab9e66a6cd0d2c8e0179b64d73e38
|
File details
Details for the file ev_tabpfn-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ev_tabpfn-0.1.1-py3-none-any.whl
- Upload date:
- Size: 46.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9018b684850885adc86bfd737f9dfdf90fd326530d76e5e63aaa14a33f462205
|
|
| MD5 |
8c143996ea7d8a85600473d9efeec1b5
|
|
| BLAKE2b-256 |
2f1f0abc2074303550d29a6d7dcfd800a14325bce1d2314c58f8647e661e296f
|