Algorithm selection with zero domain knowledge via text embeddings
Project description
ZeroFolio
Algorithm selection with zero domain knowledge via text embeddings.
ZeroFolio selects the best algorithm for a problem instance using a three-step pipeline:
- Serialize — read the raw instance file as plain text (
zf.serialize) - Embed — pass the text through any pretrained embedding model (user-provided)
- Select — pick the best algorithm via weighted k-nearest neighbors (
zf.KNNSelector)
No feature engineering, no domain expertise, no training required. ZeroFolio handles serialization and selection; you bring your own embedding API.
Installation
pip install zerofolio
Verify Installation
A quick smoke test with synthetic data (no API key needed):
import numpy as np
import zerofolio as zf
# Synthetic embeddings and runtimes for 50 instances, 4 algorithms
X_train = np.random.rand(50, 128)
rt_train = np.random.rand(50, 4) * 100
selector = zf.KNNSelector(k=5)
selector.fit(X_train, rt_train)
X_test = np.random.rand(3, 128)
print(selector.predict(X_test)) # e.g. array([2, 0, 3])
Quick Start (with Gemini Embeddings)
A full example using Google's Gemini embedding model.
Requires: pip install google-genai
import os
import numpy as np
import zerofolio as zf
from google import genai
# --- Step 1: Serialize instance files ---
# Reads the file as plain text, shuffles lines, truncates to budget.
text = zf.serialize("instance.cnf", strategy="line_shuffle", max_chars=10000, seed=42)
# --- Step 2: Embed (user-provided) ---
# ZeroFolio does not call any API itself. You provide an embedding function.
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
def embed(text):
result = client.models.embed_content(
model="gemini-embedding-001",
contents=text,
)
return result.embeddings[0].values
# Embed all your training instances into numpy arrays:
# For each instance file, call zf.serialize() then embed().
# train_embeddings: np.array of shape (n_train, 3072)
# train_runtimes: np.array of shape (n_train, n_algorithms)
# Each row gives the runtime (seconds) of each algorithm on that instance.
# Use PAR10 values (10× cutoff for unsolved) for best results.
train_embeddings = np.array([embed(zf.serialize(f)) for f in train_files])
# --- Step 3: Select via k-NN ---
selector = zf.KNNSelector(k=10, metric="manhattan")
selector.fit(train_embeddings, train_runtimes)
test_embedding = np.array([embed(text)])
best_algo_idx = selector.predict(test_embedding)
print(f"Selected algorithm index: {best_algo_idx[0]}")
Using Algorithm Names
Pass algorithm names to get predictions as strings instead of indices:
algos = ["minisat", "glucose", "cadical", "kissat"]
selector = zf.KNNSelector(k=10, algorithm_names=algos)
selector.fit(train_embeddings, train_runtimes)
print(selector.predict(test_embedding)) # e.g. ["glucose"]
Multi-Seed Voting
Different random seeds expose different parts of the instance file to the embedding model.
Average the scores across seeds for more robust selection.
(Assumes embed and selector are set up as in the Quick Start above.)
scores = []
for seed in [42, 100, 500]:
text = zf.serialize("instance.cnf", seed=seed)
emb = np.array([embed(text)])
scores.append(selector.predict_scores(emb)) # shape (1, n_algorithms)
avg_scores = np.mean(scores, axis=0) # shape (1, n_algorithms)
best_algo_idx = int(np.argmin(avg_scores[0]))
API Reference
zf.serialize(path, strategy, max_chars, seed)
Serialize an instance file to text. Works with any text-based format (CNF, WCNF, QDIMACS, MiniZinc, ASP, MPS, etc.). Gzipped files (.gz) are handled automatically.
strategy:"line_shuffle"(default, recommended) or"raw". Usezf.list_strategies()to see all options.max_chars: Character budget for truncation (default 10,000). The effective context window of Gemini Embedding models is approximately 2,048 tokens.seed: Random seed for line shuffling (default 42). Different seeds yield different views of the same instance. Ignored by the"raw"strategy.
zf.KNNSelector(k, metric, algorithm_names)
Weighted k-NN algorithm selector using inverse-distance weighting.
k: Number of neighbors (default 10). Must not exceed the number of training instances.metric:"manhattan"(default, recommended) or"cosine".algorithm_names: Optional list of algorithm name strings. When provided,predict()returns a list of name strings; otherwise it returns a numpy array of column indices.
Methods:
.fit(embeddings, runtimes)— fit on training embeddings and runtime matrix. RaisesValueErrorifkexceeds the number of training instances..predict(embeddings)— return best algorithm per instance (names or indices).predict_scores(embeddings)— return per-algorithm weighted-average scores, shape(n_test, n_algorithms)
zf.list_strategies()
Return the list of available serialization strategy names.
Persistence
ZeroFolio selectors can be saved and loaded with pickle:
import pickle
with open("selector.pkl", "wb") as f:
pickle.dump(selector, f)
with open("selector.pkl", "rb") as f:
selector = pickle.load(f)
Citation
@article{szeider2026zerofolio,
title={Algorithm Selection with Zero Domain Knowledge via Text Embeddings},
author={Szeider, Stefan},
year={2026}
}
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zerofolio-0.1.0.tar.gz.
File metadata
- Download URL: zerofolio-0.1.0.tar.gz
- Upload date:
- Size: 27.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8f3feeec02a14bad24fd5dd09611040eb2586a8d647b2dfb318b29e6f272eb9
|
|
| MD5 |
cb289941b366a38a9656d4a05ec299b2
|
|
| BLAKE2b-256 |
fc295ec556259285fff01ae05fe781c022add8ebb4d7c239addadce927cd23f6
|
File details
Details for the file zerofolio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: zerofolio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5ac5d9f56cfeecd690ba43edc49aaeb8affd9d22f1ff6b35ca413ccd8eb09b3
|
|
| MD5 |
f9a715694ef2c69c59c9fbdd56b00f0c
|
|
| BLAKE2b-256 |
dcc91c19d6cee7191575ea959539249a91ce3bf8d036bd4c7c55adb969cfceea
|