Generate tool-calling datasets from OpenAI-compatible tool specs
Project description
🛠️ ToolsGen
A modular Python library for synthesizing tool-calling datasets from JSON tool definitions using an LLM-as-a-judge pipeline. Designed for OpenAI-compatible APIs.
⚠️ Development Status: This project is under active development. The API is not yet stable and may undergo significant changes. Breaking changes may occur between versions.
Overview
ToolsGen automates the creation of tool-calling datasets for training and evaluating language models. It generates realistic user requests, produces corresponding tool calls, and evaluates their quality using a multi-dimensional rubric system.
Key Features
- Multi-role LLM Pipeline: Separate models for problem generation, tool calling, and quality evaluation
- Flexible Sampling Strategies: Random, parameter-aware, and semantic clustering approaches
- LLM-as-a-Judge Scoring: Rubric-based evaluation with structured outputs
- OpenAI-Compatible: Works with OpenAI API and compatible providers (Azure OpenAI, local models via vLLM, etc.)
- Hugging Face Ready: JSONL output format compatible with Hugging Face datasets
- Configurable Quality Control: Adjustable scoring thresholds and retry mechanisms
- Train/Val Splitting: Built-in dataset splitting for model training workflows
- Parallel Generation: Multiprocessing pipeline to accelerate dataset creation on multi-core hosts
Requirements
- Python 3.9+
- OpenAI API key (or compatible API endpoint)
Installation
git clone https://github.com/atasoglu/toolsgen.git
cd toolsgen
pip install .
Usage
CLI Usage
# Check version
toolsgen version
# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# Generate dataset with default settings
toolsgen generate \
--tools tools.json \
--out output_dir \
--num 100
# Advanced: Use different models and temperatures for each role
toolsgen generate \
--tools tools.json \
--out output_dir \
--num 1000 \
--strategy param_aware \
--seed 42 \
--train-split 0.9 \
--workers 4 \
--worker-batch-size 8 \
--problem-model gpt-4o-mini --problem-temp 0.9 \
--caller-model gpt-4o --caller-temp 0.3 \
--judge-model gpt-4o --judge-temp 0.0
# Parallel generation with 6 workers processing four samples per task
toolsgen generate \
--tools tools.json \
--out output_dir \
--num 500 \
--workers 6 \
--worker-batch-size 4
# Generate and push directly to Hugging Face Hub
export HF_TOKEN="your-hf-token-here"
toolsgen generate \
--tools tools.json \
--out output_dir \
--num 100 \
--push-to-hub \
--repo-id username/dataset-name
Python API Usage
from pathlib import Path
from dotenv import load_dotenv
from toolsgen.core import GenerationConfig, ModelConfig, generate_dataset
load_dotenv() # Load from .env file
# Configuration
tools_path = Path("tools.json")
output_dir = Path("output")
gen_config = GenerationConfig(
num_samples=100,
strategy="random",
seed=42,
train_split=0.9, # 90% train, 10% validation
batch_size=10, # optional: iterate tools in batches
shuffle_tools=True, # optional: reshuffle tools between batches
num_workers=4, # enable multiprocessing
worker_batch_size=2, # samples per worker task
)
model_config = ModelConfig(
model="gpt-4o-mini",
temperature=0.7,
)
# Generate dataset from file
manifest = generate_dataset(output_dir, gen_config, model_config, tools_path=tools_path)
# Or use tools list directly (alternative to tools_path)
# from toolsgen.schema import ToolSpec
# tools = [ToolSpec(...), ToolSpec(...)]
# manifest = generate_dataset(output_dir, gen_config, model_config, tools=tools)
print(f"Generated {manifest['num_generated']}/{manifest['num_requested']} records")
print(f"Failed: {manifest['num_failed']} attempts")
Push to Hugging Face Hub
from pathlib import Path
from dotenv import load_dotenv
from toolsgen import GenerationConfig, ModelConfig, generate_dataset, push_to_hub
load_dotenv() # Load from .env file
tools_path = Path("tools.json")
output_dir = Path("output")
gen_config = GenerationConfig(
num_samples=100,
strategy="random",
seed=42,
train_split=0.9,
)
model_config = ModelConfig(
model="gpt-4o-mini",
temperature=0.7,
)
# Generate dataset
manifest = generate_dataset(
output_dir=output_dir,
gen_config=gen_config,
model_config=model_config,
tools_path=tools_path,
)
# Push to Hub
hub_info = push_to_hub(
output_dir=output_dir,
repo_id="username/dataset-name",
private=False,
)
print(f"Generated: {manifest['num_generated']} records")
print(f"Repository: {hub_info['repo_url']}")
See examples/ directory for complete working examples.
Note: The examples in examples/ use python-dotenv for convenience (load API keys from .env file). Install it with pip install python-dotenv if you want to use this approach.
Output Format
Dataset Files (JSONL)
Each line in train.jsonl (or val.jsonl) is a JSON record:
{
"id": "record_000001",
"language": "english",
"tools": [...],
"messages": [
{"role": "user", "content": "What's the weather in San Francisco?"}
],
"assistant_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco, CA\"}"
}
}
],
"problem_metadata": {"generated": true, "user_request": "..."},
"judge": {
"tool_relevance": 0.4,
"argument_quality": 0.38,
"clarity": 0.2,
"score": 0.98,
"verdict": "accept",
"rationale": "Excellent tool selection and argument quality",
"rubric_version": "0.1.0",
"model": "gpt-4o",
"temperature": 0.0
},
"quality_tags": [],
"tools_metadata": {"num_tools": 5}
}
Manifest File
manifest.json contains generation metadata:
{
"version": "0.1.0",
"num_requested": 1000,
"num_generated": 987,
"num_failed": 13,
"strategy": "param_aware",
"seed": 42,
"train_split": 0.9,
"tools_count": 15,
"models": {
"problem_generator": "gpt-4o-mini",
"tool_caller": "gpt-4o",
"judge": "gpt-4o"
},
"splits": {
"train": 888,
"val": 99
}
}
Testing
# Run all tests with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_generator.py
# Run with verbose output
pytest -v
Development
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests with coverage
pytest --cov=src
# Run code quality checks
ruff check src tests --fix
ruff format src tests
Architecture
For detailed information about the system architecture, pipeline, and core components, see ARCHITECTURE.md.
Roadmap
Planned Features
- Multi-turn conversation support
- Custom prompt template system
- Parallel generation with multiprocessing
- Additional sampling strategies (coverage-based, difficulty-based)
- Integration with Hugging Face Hub for direct dataset uploads
- Support for more LLM providers (Anthropic, Cohere, etc.)
- Web UI for dataset inspection and curation
- Advanced filtering and deduplication
Known Limitations
- Single-turn conversations only
- English-focused prompts (multilingual support is experimental)
- No built-in tool execution or validation
- Limited to OpenAI-compatible APIs
Contributing
Contributions are welcome! Please note that the API is still evolving. Before starting major work, please open an issue to discuss your proposed changes.
License
MIT License - see LICENSE for details.
Citation
If you use ToolsGen in your research, please cite:
@software{toolsgen2025,
title = {ToolsGen: Synthetic Tool-Calling Dataset Generator},
author = {Ataşoğlu, Ahmet},
year = {2025},
url = {https://github.com/atasoglu/toolsgen}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toolsgen-0.5.0.tar.gz.
File metadata
- Download URL: toolsgen-0.5.0.tar.gz
- Upload date:
- Size: 39.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b405a6efea2705c3157673b276f2c8a8ee6566640cbb2cc2e4d0fdc0d2153d4b
|
|
| MD5 |
096df7812d507aa0321b7b537980f91d
|
|
| BLAKE2b-256 |
0f0c723f0c54f517d4b95824381f86bec51e435b1112fe942bdc53e2de1f5c6c
|
File details
Details for the file toolsgen-0.5.0-py3-none-any.whl.
File metadata
- Download URL: toolsgen-0.5.0-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efd8e09792c05c6339b06f0bb1539651bd8b67dda1fb319e0ef33df57c77ec07
|
|
| MD5 |
0ae79c0c8d85a40461992cb2c7a32b82
|
|
| BLAKE2b-256 |
247e704e78db570086ff5df968df906ba305eb7db7e905dae8e05a171777b89d
|