A tool for generating synthetic function call datasets for Large Language Models (LLMs).

These details have not been verified by PyPI

Project links

Project description

🛠️ openllm-func-call-synthesizer

PyPI version

Lightweight toolkit to synthesize function-call datasets and convert them to formats compatible with OpenAI-style function-call training and downstream tooling (including Llama Factory compatible exports).

✨ Features

📝 Generate synthetic function call datasets for LLM training and evaluation
⚙️ Flexible configuration via YAML and Hydra
💻 CLI interface powered by Typer & Rich
🔧 Utility functions for dataset manipulation
🔄 Extensible and easy to integrate into your own pipeline
🌐 Supports multiple LLM backends (OpenAI, Google, etc.)
📊 Export formats: JSONL, CSV, Parquet, LlamaFactory-compatible

🛠 Installation

Prerequisites

Python 3.12+ (match environment used by the project)
API credentials for any LLM backend (set via environment variables or .env file)
- Example: OPENAI_API_KEY
- See .env.example for reference
🔌 MCP Server (Required)

This project relies on an MCP server to provide tool/function metadata.

Before running the synthesizer, you must start an MCP server.

▶ Start the example MCP server

An example MCP server is included in the repository:

python examples/mcp_example_sserver/server.py

This will start a local MCP server that the synthesizer can connect to.

Make sure your configuration (e.g. mcp_servers.transport) matches the server address.

⸻

⚠ Important
- The synthesizer will fail if no MCP server is available.
- Ensure the server is running before executing:

python -m apps.main

* If you see connection errors, verify:
* The server is running
* The transport URL in your config is correct
* Network/firewall settings allow local connections

⸻

Install from PyPI

pip install openllm-func-call-synthesizer
# or using uv
uv add openllm-func-call-synthesizer

Install from source

git clone https://github.com/diqiuzhuanzhuan/openllm-func-call-synthesizer.git
cd openllm-func-call-synthesizer
uv sync

Is there no tool named 'uv'? You can install it with just one command:

curl -LsSf https://astral.sh/uv/install.sh | sh

⸻

⚡ Quickstart

Run the synthesizer with default config:

python -m apps.main

Enable only query generation:

python -m apps.main synthesizer.query_generation.enable=True

Enable function-call generation with custom name:

python -m apps.main synthesizer.function_call_generation.enable=True synthesizer.function_call_generation.name=function_call_gpt_4o

Override languages dynamically:

python -m apps.main synthesizer.query_generation.languages=[English,Spanish]

⸻

📂 Outputs

Generated datasets are written to data//
Each run produces:
train.jsonl
output.csv
output.parquet
llama_factory step creates LlamaFactory-compatible train.jsonl

⸻

🧪 Testing

Run the test suite:

pytest -q

⸻

📝 Configuration Highlights

Configuration file: examples/conf/synthesizer/default.yaml

mcp_servers — MCP server(s) to query for available tools
choose_part_tools — filter toolset to a subset
query_generation — generate seed queries from function docs
function_call_generation — generate function-call pairs from queries
critic — optional scoring/critique step
llama_factory — export to LlamaFactory-compatible dataset
verl - export to verl-compatible dataset

See docs for full field descriptions.

Default pipeline walk-through

The provided examples/conf/synthesizer/default.yaml wires every stage together:

MCP bootstrap: points to a local ugreen_mcp server on http://localhost:8000/mcp; leave it running before launching the synth job or queries will fail.
Tool filtering: choose_part_tools: false keeps the full toolset; set it to a list (e.g. ["search_photos"]) to restrict generations to specific tools.
Query generation: reads examples/function_docs.json, emits multilingual prompts (English/Chinese/Japanese/German) under data/function_query via parallel OpenAI + Google model pools, each with generous TPM throttles for high-throughput runs.
Function-call synthesis: consumes the query dataset, calls gpt-4o through the OpenAI backend, and writes data/function_call_gpt_4o/*.jsonl (set max_num to limit volume or switch output_format).
Critic pass: re-scores every call with gpt-5-mini-2025-08-07, expecting query/prompt/function_call/functions/answer fields and emitting a scored dataset named function_call_gpt_4o_critiqued_by_gpt_5_mini_2025_08_07.
Downstream exports: both llama_factory and verl blocks draw from the critic output, keep only rows with score >= 8, and materialize ready-to-train JSONL files plus optional train/val splits.

Feel free to copy the default file, tweak model lists or directories, and pass it via python -m apps.main synthesizer=@your_config.yaml for customized runs. For custom configurations, please refer to example/conf/synthesizer/default.yaml. ⸻

🐚 Parallel Runner

Helper script: bin/run_pipeline.sh

Launch multiple synthesizer runs in parallel
Requires .venv virtual environment
Example usage:

chmod +x bin/run_pipeline.sh
bin/run_pipeline.sh default other

Logs are printed to console; returns non-zero if any run fails
Can also run manually using:

python -m apps.main synthesizer=default &
python -m apps.main synthesizer=other &
wait

⸻

Contributing

Welcome to contribute！Please refer to CONTRIBUTING.md for details.

License

MIT License. See LICENSE for details.

Links

⸻

🌟 Star History

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Feb 19, 2026

0.1.1

Feb 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openllm_func_call_synthesizer-0.1.2.tar.gz (23.4 kB view details)

Uploaded Feb 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openllm_func_call_synthesizer-0.1.2-py3-none-any.whl (31.8 kB view details)

Uploaded Feb 19, 2026 Python 3

File details

Details for the file openllm_func_call_synthesizer-0.1.2.tar.gz.

File metadata

Download URL: openllm_func_call_synthesizer-0.1.2.tar.gz
Upload date: Feb 19, 2026
Size: 23.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for openllm_func_call_synthesizer-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`d1641a71151f74360083d525f505c1d3871455a8b93bb912217bce7c203515da`
MD5	`f33fe8bb9f620b1a84689dcd0697ffc9`
BLAKE2b-256	`4d1daf990cb8bbe49aca0d516c232566a7e147146183579740292cf4965cae14`

See more details on using hashes here.

File details

Details for the file openllm_func_call_synthesizer-0.1.2-py3-none-any.whl.

File metadata

Download URL: openllm_func_call_synthesizer-0.1.2-py3-none-any.whl
Upload date: Feb 19, 2026
Size: 31.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for openllm_func_call_synthesizer-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c8e710d5dcb0e620bc56696efcf21d7785579830071bbf74bb3634538a61041b`
MD5	`698fc245efcf2ebabc31bde848db513e`
BLAKE2b-256	`8deb53004662e324e03cf6ffcc34cc9aa00cde5408f1fff5138793e3ae0986fb`

See more details on using hashes here.

openllm-func-call-synthesizer 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🛠️ openllm-func-call-synthesizer

✨ Features

🛠 Installation

Prerequisites

⸻

Install from PyPI

Default pipeline walk-through

Contributing

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes