Skip to main content

Algorithm selection with zero domain knowledge via text embeddings

Project description

ZeroFolio

Algorithm selection with zero domain knowledge via text embeddings.

ZeroFolio selects the best algorithm for a problem instance using a three-step pipeline:

  1. Serialize — read the raw instance file as plain text (zf.serialize)
  2. Embed — pass the text through any pretrained embedding model (user-provided)
  3. Select — pick the best algorithm via weighted k-nearest neighbors (zf.KNNSelector)

No feature engineering, no domain expertise, no training required. ZeroFolio handles serialization and selection; you bring your own embedding API.

Installation

pip install zerofolio

Verify Installation

A quick smoke test with synthetic data (no API key needed):

import numpy as np
import zerofolio as zf

# Synthetic embeddings and runtimes for 50 instances, 4 algorithms
X_train = np.random.rand(50, 128)
rt_train = np.random.rand(50, 4) * 100

selector = zf.KNNSelector(k=5)
selector.fit(X_train, rt_train)

X_test = np.random.rand(3, 128)
print(selector.predict(X_test))  # e.g. array([2, 0, 3])

Quick Start (with Gemini Embeddings)

A full example using Google's Gemini embedding model. Requires: pip install google-genai

import os
import numpy as np
import zerofolio as zf
from google import genai

# --- Step 1: Serialize instance files ---
# Reads the file as plain text, shuffles lines, truncates to budget.
text = zf.serialize("instance.cnf", strategy="line_shuffle", max_chars=10000, seed=42)

# --- Step 2: Embed (user-provided) ---
# ZeroFolio does not call any API itself. You provide an embedding function.
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

def embed(text):
    result = client.models.embed_content(
        model="gemini-embedding-001",
        contents=text,
    )
    return result.embeddings[0].values

# Embed all your training instances into numpy arrays:
# For each instance file, call zf.serialize() then embed().
# train_embeddings: np.array of shape (n_train, 3072)
# train_runtimes:   np.array of shape (n_train, n_algorithms)
#     Each row gives the runtime (seconds) of each algorithm on that instance.
#     Use PAR10 values (10× cutoff for unsolved) for best results.
train_embeddings = np.array([embed(zf.serialize(f)) for f in train_files])

# --- Step 3: Select via k-NN ---
selector = zf.KNNSelector(k=10, metric="manhattan")
selector.fit(train_embeddings, train_runtimes)

test_embedding = np.array([embed(text)])
best_algo_idx = selector.predict(test_embedding)
print(f"Selected algorithm index: {best_algo_idx[0]}")

Using Algorithm Names

Pass algorithm names to get predictions as strings instead of indices:

algos = ["minisat", "glucose", "cadical", "kissat"]
selector = zf.KNNSelector(k=10, algorithm_names=algos)
selector.fit(train_embeddings, train_runtimes)

print(selector.predict(test_embedding))  # e.g. ["glucose"]

Multi-Seed Voting

Different random seeds expose different parts of the instance file to the embedding model. Average the scores across seeds for more robust selection. (Assumes embed and selector are set up as in the Quick Start above.)

scores = []
for seed in [42, 100, 500]:
    text = zf.serialize("instance.cnf", seed=seed)
    emb = np.array([embed(text)])
    scores.append(selector.predict_scores(emb))  # shape (1, n_algorithms)

avg_scores = np.mean(scores, axis=0)  # shape (1, n_algorithms)
best_algo_idx = int(np.argmin(avg_scores[0]))

API Reference

zf.serialize(path, strategy, max_chars, seed) Serialize an instance file to text. Works with any text-based format (CNF, WCNF, QDIMACS, MiniZinc, ASP, MPS, etc.). Gzipped files (.gz) are handled automatically.

  • strategy: "line_shuffle" (default, recommended) or "raw". Use zf.list_strategies() to see all options.
  • max_chars: Character budget for truncation (default 10,000). The effective context window of Gemini Embedding models is approximately 2,048 tokens.
  • seed: Random seed for line shuffling (default 42). Different seeds yield different views of the same instance. Ignored by the "raw" strategy.

zf.KNNSelector(k, metric, algorithm_names) Weighted k-NN algorithm selector using inverse-distance weighting.

  • k: Number of neighbors (default 10). Must not exceed the number of training instances.
  • metric: "manhattan" (default, recommended) or "cosine".
  • algorithm_names: Optional list of algorithm name strings. When provided, predict() returns a list of name strings; otherwise it returns a numpy array of column indices.

Methods:

  • .fit(embeddings, runtimes) — fit on training embeddings and runtime matrix. Raises ValueError if k exceeds the number of training instances.
  • .predict(embeddings) — return best algorithm per instance (names or indices)
  • .predict_scores(embeddings) — return per-algorithm weighted-average scores, shape (n_test, n_algorithms)

zf.list_strategies() Return the list of available serialization strategy names.

Persistence

ZeroFolio selectors can be saved and loaded with pickle:

import pickle

with open("selector.pkl", "wb") as f:
    pickle.dump(selector, f)

with open("selector.pkl", "rb") as f:
    selector = pickle.load(f)

Citation

@article{szeider2026zerofolio,
  title={Algorithm Selection with Zero Domain Knowledge via Text Embeddings},
  author={Szeider, Stefan},
  year={2026}
}

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zerofolio-0.1.0.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zerofolio-0.1.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file zerofolio-0.1.0.tar.gz.

File metadata

  • Download URL: zerofolio-0.1.0.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for zerofolio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e8f3feeec02a14bad24fd5dd09611040eb2586a8d647b2dfb318b29e6f272eb9
MD5 cb289941b366a38a9656d4a05ec299b2
BLAKE2b-256 fc295ec556259285fff01ae05fe781c022add8ebb4d7c239addadce927cd23f6

See more details on using hashes here.

File details

Details for the file zerofolio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: zerofolio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for zerofolio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5ac5d9f56cfeecd690ba43edc49aaeb8affd9d22f1ff6b35ca413ccd8eb09b3
MD5 f9a715694ef2c69c59c9fbdd56b00f0c
BLAKE2b-256 dcc91c19d6cee7191575ea959539249a91ce3bf8d036bd4c7c55adb969cfceea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page