syftr is an agent optimizer that helps you find the best agentic workflows for your budget.
Project description
Efficient Search for Pareto-optimal Flows
syftr is an agent optimizer that helps you find the best agentic workflows for a given budget. You bring your own dataset, compose the search space from models and components, and syftr finds the best combination of parameters for your budget. It uses advances in multi-objective Bayesian Optimization and a novel domain-specific "Pareto Pruner" to efficiently sample a search space of agentic and non-agentic flows to estimate a Pareto-frontier (optimal trade-off curve) between accuracy and objectives that compete like cost, latency, throughput.
Please read more details in our blogpost and full technical paper.
We are excited for what you will discover using syftr!
Libraries and frameworks used
syftr builds on a number of powerful open source projects:
-
Ray for distributing and scaling search over large clusters of CPUs and GPUs
-
Optuna for its flexible define-by-run interface (similar to PyTorch’s eager execution) and support for state-of-the-art multi-objective optimization algorithms
-
LlamaIndex for building sophisticated agentic and non-agentic RAG workflows
-
HuggingFace Datasets for fast, collaborative, and uniform dataset interface
-
Trace for optimizing textual components within workflows, such as prompts
Installation
Please clone the syftr repo and run:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.12.7
source .venv/bin/activate
uv sync --extra dev
Required Credentials
syftr's examples require the following credentials:
- Azure OpenAI API key
- Azure OpenAI endpoint URL (
api_url) - PostgreSQL server dsn (if no dsn is provided, will use local SQLite)
To enter these credentials, copy config.yaml.sample to config.yaml and edit the required portions.
Additional Configuration Options
syftr uses many components including Ray for job scheduling and PostgreSQL for storing results. In this section we describe how to configure them to run syftr successfully.
- The main config file of syftr is
config.yaml. You can specify paths, logging, database and Ray parameters and many others. For detailed instructions and examples, please refer to config.yaml.sample. You can rename this file toconfig.yamland fill in all necessary details according to your infrastructure. - You can also configure syftr with environment variables:
export SYFTR_PATHS__ROOT_DIR=/foo/bar - When the configuration is correct, you should be able to run
examples/1-welcome.ipynbwithout any problems. - syftr uses SQLite by default for Optuna storage. The
database.dsnconfiguration field can be used to configure any Optuna-supported relational database storage. We recommend Postgres for distributed workloads.
Quickstart
First, run make check to validate your credentials and configuration.
Note that most LLM connections are likely to fail if you have not provided configuration for them.
Next, try the example Jupyter notebooks located in the examples directory.
Or directly run a syftr study with user API:
from syftr import api
s = api.Study.from_file("studies/example-dr-docs.yaml")
s.run()
Obtaining the results after the study is complete:
s.wait_for_completion()
print(s.pareto_flows)
[{'metrics': {'accuracy': 0.7, 'llm_cost_mean': 0.000258675},
'params': {'response_synthesizer_llm': 'gpt-4o-mini',
'rag_mode': 'no_rag',
'template_name': 'default',
'enforce_full_evaluation': True}},
...
]
Custom LLMs
In addition to the built-in LLMs, you may enable additional OpenAI-API-compatible API endpoints in the config.yaml.
For example:
local_models:
default_api_key: "YOUR_API_KEY_HERE"
generative:
- model_name: "microsoft/Phi-4-multimodal-instruct"
api_base: "http://phi-4-host.com/openai/v1"
max_tokens: 2000
context_window: 129072
is_function_calling_model: true
additional_kwargs:
frequency_penalty: 1.0
temperature: 0.1
- model_name: "deepseek-ai/DeepSeek-R1-Distill-Llama-70B"
api_base: "http://big-vllm-host:8000/v1"
max_tokens: 2000
context_window: 129072
is_function_calling_model: true
additional_kwargs:
temperature: 0.6
And you may also enable additional embedding model endpoints:
local_models:
...
embedding:
- model_name: "BAAI/bge-small-en-v1.5"
api_base: "http://vllmhost:8001/v1"
api_key: "non-default-value"
additional_kwargs:
extra_body:
truncate_prompt_tokens: 512
- model_name: "thenlper/gte-large"
api_base: "http://vllmhost:8001/v1"
additional_kwargs:
extra_body:
truncate_prompt_tokens: 512
Models added in the config.yaml will be automatically added to the default search space, or you can enable them manually for specific flow components.
Custom Datasets
See detailed instructions here.
Citation
If you use this code in your research please cite the following publication.
@article{syftr2025,
title={syftr: Pareto-Optimal Generative AI},
author={Conway, Alexander and Dey, Debadeepta and Hackmann, Stefan and Hausknecht, Matthew and Schmidt, Michael and Steadman, Mark and Volynets, Nick},
booktitle={Proceedings of the International Conference on Automated Machine Learning (AutoML)},
year={2025},
}
Contributing
Please read our contributing guide for details on how to contribute to the project. We welcome contributions in the form of bug reports, feature requests, and pull requests.
Please note we have a code of conduct, please follow it in all your interactions with the project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file syftr-0.0.2a2.tar.gz.
File metadata
- Download URL: syftr-0.0.2a2.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cb7311f0486e525b6f3ed3c258ba354347fd2db12aa0d159498ef30728bff06
|
|
| MD5 |
a6c21fe16b2cf9cdc84b6aef9be399ed
|
|
| BLAKE2b-256 |
8d9c1b8ac34436c976616d77a5d3c68a0d2a542bcc8a11675fe2279bcbacbc62
|
Provenance
The following attestation bundles were made for syftr-0.0.2a2.tar.gz:
Publisher:
pypi.yaml on datarobot/syftr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
syftr-0.0.2a2.tar.gz -
Subject digest:
4cb7311f0486e525b6f3ed3c258ba354347fd2db12aa0d159498ef30728bff06 - Sigstore transparency entry: 225907154
- Sigstore integration time:
-
Permalink:
datarobot/syftr@95b7302f9f278fe9f5b3acd2bf74f797fb6f6060 -
Branch / Tag:
refs/tags/v0.0.2-alpha-2 - Owner: https://github.com/datarobot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yaml@95b7302f9f278fe9f5b3acd2bf74f797fb6f6060 -
Trigger Event:
release
-
Statement type:
File details
Details for the file syftr-0.0.2a2-py3-none-any.whl.
File metadata
- Download URL: syftr-0.0.2a2-py3-none-any.whl
- Upload date:
- Size: 175.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00cc8dae0562540fc8c2904a72edcbecd7af8585a1824e9cff8df98228bb7ed0
|
|
| MD5 |
85d37be4903d6afb5c34fd1330f3df63
|
|
| BLAKE2b-256 |
77e83e5b229a994db2e6a4425c91c6ecf7392324f3b01f94c143eae99a8d5712
|
Provenance
The following attestation bundles were made for syftr-0.0.2a2-py3-none-any.whl:
Publisher:
pypi.yaml on datarobot/syftr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
syftr-0.0.2a2-py3-none-any.whl -
Subject digest:
00cc8dae0562540fc8c2904a72edcbecd7af8585a1824e9cff8df98228bb7ed0 - Sigstore transparency entry: 225907155
- Sigstore integration time:
-
Permalink:
datarobot/syftr@95b7302f9f278fe9f5b3acd2bf74f797fb6f6060 -
Branch / Tag:
refs/tags/v0.0.2-alpha-2 - Owner: https://github.com/datarobot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yaml@95b7302f9f278fe9f5b3acd2bf74f797fb6f6060 -
Trigger Event:
release
-
Statement type: