syftr is an agent optimizer that helps you find the best agentic workflows for your budget.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

datarobot mhauskn-dr nv_datarobot

These details have not been verified by PyPI

Project links

Homepage

Project description

Efficient Search for Pareto-optimal Flows

syftr is an agent optimizer that helps you find the best agentic workflows for a given budget. You bring your own dataset, compose the search space from models and components, and syftr finds the best combination of parameters for your budget. It uses advances in multi-objective Bayesian Optimization and a novel domain-specific "Pareto Pruner" to efficiently sample a search space of agentic and non-agentic flows to estimate a Pareto-frontier (optimal trade-off curve) between accuracy and objectives that compete like cost, latency, throughput.

syftr

Please read more details in our blogpost and full technical paper.

We are excited for what you will discover using syftr!

Libraries and frameworks used

syftr builds on a number of powerful open source projects:

Ray for distributing and scaling search over large clusters of CPUs and GPUs
Optuna for its flexible define-by-run interface (similar to PyTorch’s eager execution) and support for state-of-the-art multi-objective optimization algorithms
LlamaIndex for building sophisticated agentic and non-agentic RAG workflows
HuggingFace Datasets for fast, collaborative, and uniform dataset interface
Trace for optimizing textual components within workflows, such as prompts

Installation

Please clone the syftr repo and run:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.12.7
source .venv/bin/activate
uv sync --extra dev
uv pip install -e .

or to use syftr as a library, install directly from PyPi:

pip install syftr

NOTE: syftr works as a library, but still needs easy access to config.yaml and study files you intend to run. Config file should be present as ~/.syftr/config.yaml, or in your current working directory. You can download sample config file to your ~/.syftr directory with this command

curl -L https://raw.githubusercontent.com/datarobot/syftr/main/config.yaml.sample \
     -o ~/.syftr/config.yaml

You also need studies to run syftr. You can write your own or download our example study with this command to current working directory

curl -L https://raw.githubusercontent.com/datarobot/syftr/main/studies/example-dr-docs.yaml > example-dr-docs.yaml

Required Credentials

syftr's examples require the following credentials:

Azure OpenAI API key
Azure OpenAI endpoint URL (api_url)
PostgreSQL server dsn (if no dsn is provided, will use local SQLite)

To enter these credentials, copy config.yaml.sample to config.yaml and edit the required portions.

Additional Configuration Options

syftr uses many components including Ray for job scheduling and PostgreSQL for storing results. In this section we describe how to configure them to run syftr successfully.

The main config file of syftr is config.yaml. You can specify paths, logging, database and Ray parameters and many others. For detailed instructions and examples, please refer to config.yaml.sample. You can rename this file to config.yaml and fill in all necessary details according to your infrastructure.
You can also configure syftr with environment variables: export SYFTR_PATHS__ROOT_DIR=/foo/bar
When the configuration is correct, you should be able to run examples/1-welcome.ipynb without any problems.
syftr uses SQLite by default for Optuna storage. The database.dsn configuration field can be used to configure any Optuna-supported relational database storage. We recommend Postgres for distributed workloads.

Quickstart

First, run syftr check to validate your credentials and configuration. Note that most LLM connections are likely to fail if you have not provided configuration for them. Next, try the example Jupyter notebooks located in the examples directory. Or directly run a syftr study using the CLI syftr run studies/example-dr-docs.yaml --follow or with the API:

from syftr import api

s = api.Study.from_file("studies/example-dr-docs.yaml")
s.run()

Obtaining the results after the study is complete:

s.wait_for_completion()
print(s.pareto_flows)
[{'metrics': {'accuracy': 0.7, 'llm_cost_mean': 0.000258675},
  'params': {'response_synthesizer_llm': 'gpt-4o-mini',
   'rag_mode': 'no_rag',
   'template_name': 'default',
   'enforce_full_evaluation': True}},
   ...
]

LLM Configuration

syftr can be configured to use a wide variety of LLMs from a variety of LLM providers. These are configured using the generative_models section of config.yaml.

Each LLM provider has some different configuration options as well as some common ones. Let's look at an example using gpt-4.5-preview hosted in Azure OpenAI:

generative_models:
  # azure_openai Provider Example
  azure_gpt_45_preview:
    provider: azure_openai

    temperature: 0.0
    max_retries: 0

    # Provider-specific configurations
    deployment_name: "gpt-4.5-preview"
    api_version: "2024-12-01-preview"
    additional_kwargs:
      user: syftr

    # Cost example - options are the same for all models (required)
    cost:
      type: tokens                      # tokens, characters, or hourly
      input: 75
      output: 150.00
      # rate: 12.00

    # LLamaIndex LLMetadata Example - keys and defaults are the same for all models
    metadata:
      model_name: gpt-4.5-preview
      context_window: 100000
      num_output: 2048
      is_chat_model: true
      is_function_calling_model: true
      system_role: SYSTEM

Provider-specific options

All LLM configurations defined under generative_models: share a common set of options inherited from the base LLMConfig:

cost: (Object, Required) Defines the cost structure for the LLM.
- type: (String, Required) Type of cost calculation: tokens, characters, or hourly.
- input: (Float, Required) Cost for input (e.g., per million tokens/characters).
- output: (Float, Required if type is tokens or characters) Cost for output.
- rate: (Float, Required if type is hourly) Average cost per hour.
metadata: (Object, Required) Contains essential metadata about the LLM.
- model_name: (String, Required) The specific model identifier (e.g., "gpt-4o-mini", "gemini-1.5-pro-001").
- context_window: (Integer, Optional) The maximum context window size. Defaults to 3900.
- num_output: (Integer, Optional) Default number of output tokens the model is expected to generate. Defaults to 256.
- is_chat_model: (Boolean, Optional) Indicates if the model is a chat-based model. Defaults to false.
- is_function_calling_model: (Boolean, Optional) Indicates if the model supports function calling. Defaults to false.
- system_role: (String, Optional) The expected role name for system prompts (e.g., SYSTEM, USER). Defaults to SYSTEM.
temperature: (Float, Optional) The sampling temperature for generation. Defaults to 0.0.

See LLM provider-specific configuration to configure each supported provider.

Embedding models

You may also enable additional embedding model endpoints:

local_models:
...
  embedding:
    - model_name: "BAAI/bge-small-en-v1.5"
      api_base: "http://vllmhost:8001/v1"
      api_key: "non-default-value"
      additional_kwargs:
        extra_body:
          truncate_prompt_tokens: 512
    - model_name: "thenlper/gte-large"
      api_base: "http://vllmhost:8001/v1"
      additional_kwargs:
        extra_body:
          truncate_prompt_tokens: 512

Models added in the config.yaml will be automatically added to the default search space, or you can enable them manually for specific flow components.

Adding Custom Datasets

See detailed instructions here.

Adding Custom Flows

To add your own flow class, follow the guide here.

Citation

If you use this code in your research please cite the following publication.

@article{syftr2025,
  title={syftr: Pareto-Optimal Generative AI},
  author={Conway, Alexander and Dey, Debadeepta and Hackmann, Stefan and Hausknecht, Matthew and Schmidt, Michael and Steadman, Mark and Volynets, Nick},
  booktitle={Proceedings of the International Conference on Automated Machine Learning (AutoML)},
  year={2025},
}

Contributing

Please read our contributing guide for details on how to contribute to the project. We welcome contributions in the form of bug reports, feature requests, and pull requests.

Please note we have a code of conduct, please follow it in all your interactions with the project.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

datarobot mhauskn-dr nv_datarobot

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.4.0

Jan 27, 2026

This version

0.3.0

Aug 18, 2025

0.2.2

Jul 1, 2025

0.2.1

Jun 26, 2025

0.2.0

Jun 13, 2025

0.1.3

Jun 13, 2025

0.1.2

Jun 12, 2025

0.1.1

Jun 10, 2025

0.1.0

Jun 6, 2025

0.0.2a4 pre-release

Jun 4, 2025

0.0.2a3 pre-release

Jun 2, 2025

0.0.2a2 pre-release

May 30, 2025

0.0.1

May 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syftr-0.3.0.tar.gz (1.9 MB view details)

Uploaded Aug 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

syftr-0.3.0-py3-none-any.whl (198.0 kB view details)

Uploaded Aug 18, 2025 Python 3

File details

Details for the file syftr-0.3.0.tar.gz.

File metadata

Download URL: syftr-0.3.0.tar.gz
Upload date: Aug 18, 2025
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for syftr-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`c00e40155652ab774095ea7c1d4fccc0209bd78ae5b05dbdc710b775a6be73c7`
MD5	`6b6b7229308f95ea01f044145e64e871`
BLAKE2b-256	`ac8efc1336658e0fa90a4c289446753fca77ec8b6719528c3f35078c7819f845`

See more details on using hashes here.

Provenance

The following attestation bundles were made for syftr-0.3.0.tar.gz:

Publisher: pypi.yaml on datarobot/syftr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: syftr-0.3.0.tar.gz
- Subject digest: c00e40155652ab774095ea7c1d4fccc0209bd78ae5b05dbdc710b775a6be73c7
- Sigstore transparency entry: 407114929
- Sigstore integration time: Aug 18, 2025
Source repository:
- Permalink: datarobot/syftr@1ee345ee8ef4f2ad8bd10354a5ade1bbdf6f95f0
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/datarobot
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yaml@1ee345ee8ef4f2ad8bd10354a5ade1bbdf6f95f0
- Trigger Event: release

File details

Details for the file syftr-0.3.0-py3-none-any.whl.

File metadata

Download URL: syftr-0.3.0-py3-none-any.whl
Upload date: Aug 18, 2025
Size: 198.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for syftr-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b16cc124311b6e40d704417d80a7a2c32a06e998b4045413e1449ee81ce61c8`
MD5	`542b60e3e480c3ea81a98ad29c658c98`
BLAKE2b-256	`0ad4517c23a70cb3306f953ae20006f0556a9c5250157083a68ceaf33f65fac9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for syftr-0.3.0-py3-none-any.whl:

Publisher: pypi.yaml on datarobot/syftr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: syftr-0.3.0-py3-none-any.whl
- Subject digest: 5b16cc124311b6e40d704417d80a7a2c32a06e998b4045413e1449ee81ce61c8
- Sigstore transparency entry: 407114945
- Sigstore integration time: Aug 18, 2025
Source repository:
- Permalink: datarobot/syftr@1ee345ee8ef4f2ad8bd10354a5ade1bbdf6f95f0
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/datarobot
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yaml@1ee345ee8ef4f2ad8bd10354a5ade1bbdf6f95f0
- Trigger Event: release

syftr 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Project description

Efficient Search for Pareto-optimal Flows

Libraries and frameworks used

Installation

Required Credentials

Additional Configuration Options

Quickstart

LLM Configuration

Provider-specific options

Embedding models

Adding Custom Datasets

Adding Custom Flows

Citation

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance