library for generating conversation datasets using language models

These details have not been verified by PyPI

Project links

Project description

WizardSData

Generate Syntetic Conversations

DOCUMENTATION | EXAMPLES

A Python library for generating conversation datasets using language models. WizardSData automates the creation of simulated conversations between a client and an advisor, using templates and model APIs.

This library is perfect for generating dialogues in highly regulated sectors like Finance or Healthcare, where the use of customer data is strictly protected.

The generated dialogues can be used to train or fine-tune large language models.

¿How it works?

The library simulates a conversation between two people. One initiates the dialogue and is represented by one of the profiles from the profiles file. The other acts as the respondent, generated using information contained, also, in the profile. Therefore, if desired, the profile can include information about two different people, turning it into a conversation profile instead of a role profile.

For the library to start generating conversations, it requires a profiles file and two prompt templates. Each profile in the profiles file will define the personality or personalities part of the conversation. Different examples can be found in the templates directory.

A basic usage example can be found in usage_examples/basic_usage.py.

Let's analyze one of the examples: financial01.

In this example, we find different profile files that only vary in the number of profiles they contain. They represent various profiles from Retail Banking, requesting information about products such as home deposit accounts, pension plans, or medium and long-term investments.

Profile Creation

In the profiles directory, you will find different files containing varying numbers of profiles. Let's take a look at financial_sample01_2.json

{
    "profiles": [
        {
            "id": 1,
            "age": 30,
            "marital_status": "Single",
            "country": "Spain",
            "residence_area": "Urban",
            "profession": "Software Developer",
            "employment_status": "Employed",
            "financial_products": ["Savings account", "Tech stocks"],
            "financial_goal": "Save for house deposit",
            "investment_horizon": "Medium-term",
            "risk_tolerance": "Moderate",
            "financial_knowledge": "Intermediate"
        },
        {
            "id": 2,
            "age": 45,
            "marital_status": "Married",
            "country": "USA",
            "residence_area": "Suburb",
            "profession": "Marketing Manager",
            "employment_status": "Employed",
            "financial_products": ["401k", "Index funds"],
            "financial_goal": "Plan for retirement",
            "investment_horizon": "Long-term",
            "risk_tolerance": "Low",
            "financial_knowledge": "Intermediate"
        }
    ]
}

This JSON file defines two different profiles:

A 30-year-old single Spanish IT professional looking for information on how to save for buying a house.
A 45-year-old marketing manager interested in setting up a pension plan.

The fields to use are completely flexible, meaning the profiles can include any information you consider necessary. This information will be used to create the prompts sent to the language model to simulate the behavior of each of the two roles.

Prompt templates.

A template must be defined for each prompt. Let's look at the two templates used in this example, both of which can be found in the prompts directory.

Client Prompt Template

You are a {{ profile.age }}-year-old {{ profile.marital_status | lower }} client living in a {{ profile.residence_area | lower }} area of {{ profile.country }}. 
You work as a {{ profile.profession | lower }} and have {{ profile.financial_knowledge | lower }} financial knowledge. 
You currently have {{ profile.financial_products | join(' and ') }}. 
Your main financial goal is to {{ profile.financial_goal | lower }} in the {{ profile.investment_horizon | lower }}. 
You have a {{ profile.risk_tolerance | lower }} risk tolerance and are looking for advice on how to improve your saving and investment strategy.

You are having a conversation with a financial advisor.
- Your first message should be a BRIEF, CASUAL greeting. Don't reveal all your financial details at once.
- For example, just say hi and mention ONE thing like wanting advice about saving or investments.
- Keep your first message under 15-30 words. Let the conversation develop naturally.
- In later messages, respond naturally to the advisor's questions, revealing information gradually.
- Provide ONLY your next message as the client. Do not simulate the advisor's responses.
- Start with a natural greeting if this is your first message.
- Ask relevant questions or express concerns to achieve your goal.
- Respond naturally and concisely to the advisor's previous message.
- Try to conclude the conversation in fewer than {{ max_questions }} exchanges.
- If you feel your questions are resolved, end your message with '[END]'.

Advisor Prompt Template.

You are an expert financial advisor specializing in {{ profile.financial_goal | lower }}.

Client Context:
- The client is approximately {{ profile.age }} years old, {{ profile.marital_status | lower }}, and appears to be a {{ profile.profession | lower }} from {{ profile.country }}.
- The client's financial goal is to {{ profile.financial_goal | lower }}.

Instructions for the conversation:
- Start by greeting the client and asking relevant, natural questions to understand their financial situation, preferences, and concerns.
- Guide the conversation by asking about their current financial products, investment experience, and risk tolerance.
- Provide clear, concise, and professional advice tailored to the client's goal and profile as the information is revealed.
- Avoid using complex financial jargon unless necessary, and adapt your language to the client's knowledge level (you'll assess this through conversation).
- Focus on actionable recommendations to help the client achieve their goal.
- Keep the conversation realistic and friendly.
- End the conversation naturally once you believe the client's doubts have been resolved, or explicitly conclude by saying '[END]'

As you can see, the information in both templates is filled using the data from the profile fields.

Installation

Install directly from GitHub:

pip install git+https://github.com/peremartra/WizardSData.git

Features

Generate simulated conversations between clients and advisors
Use templates to create dynamic prompts based on user profiles
Configure model parameters for both client and advisor sides
Save conversations in structured JSON format

Quick Start

import wizardsdata as wsd

# Configure the library
wsd.set_config(
    API_KEY="your-api-key",
    template_client_prompt="path/to/client_template.j2",
    template_advisor_prompt="path/to/advisor_template.j2",
    file_profiles="path/to/profiles.json",
    file_output="path/to/output.json",
    model_client="gpt-4o-mini",
    model_advisor="gpt-4o-mini"
)

# Start generating conversations
success = wsd.start_generation()
if success:
    print("Conversations generated successfully!")
else:
    print("Failed to generate conversations.")

Template Format

The library uses Jinja2 templates for generating client and advisor prompts. Here's an example of a client template:

You are a financial client with the following profile:
- Age: {{ profile.age }}
- Marital status: {{ profile.marital_status }}
- Country: {{ profile.country }}
- Financial goal: {{ profile.financial_goal }}

You want to ask an advisor about {{ profile.financial_goal }}.

Please conduct a conversation with the advisor, asking relevant questions.
If you feel satisfied with the advice, end the conversation with "[END]".
You can ask up to {{ max_questions }} questions.

Profile Format

Profiles should be provided in JSON format:

{
  "profiles": [
    {
      "id": 1,
      "age": 30,
      "marital_status": "Single",
      "country": "Spain",
      "residence_area": "Urban",
      "profession": "Software Developer",
      "employment_status": "Employed",
      "financial_products": ["Savings account", "Tech stocks"],
      "financial_goal": "Save for house deposit",
      "investment_horizon": "Medium-term",
      "risk_tolerance": "Moderate",
      "financial_knowledge": "Intermediate"
    },
    ...
  ]
}

All fields, but id are optional and you can create your own fields.

Configuration Parameters

Mandatory Parameters

API_KEY: Your OpenAI API key
template_client_prompt: Path to the client template file
template_advisor_prompt: Path to the advisor template file
file_profiles: Path to the profiles JSON file
file_output: Path to save the output JSON file
model_client: Model to use for the client
model_advisor: Model to use for the advisor

Optional Parameters (with defaults)

Client Configuration

temperature_client: 0.7
top_p_client: 0.95
frequency_penalty_client: 0.3
max_tokens_client: 175
max_recommended_questions: 10

Advisor Configuration

temperature_advisor: 0.5
top_p_advisor: 0.9
frequency_penalty_advisor: 0.1
max_tokens_advisor: 325

Advanced Usage

Saving and Loading Configuration

import wizardsdata as wsd

# Set initial configuration
wsd.set_config(API_KEY="your-api-key", ...)

# Save configuration for later use
wsd.save_config("config.json")

# Later, load the saved configuration
wsd.load_config("config.json")

# Start generation with loaded config
wsd.start_generation()

License

Apache-2.0 license

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Mar 19, 2025

This version

0.1.0

Mar 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wizardsdata-0.1.0.tar.gz (21.1 kB view details)

Uploaded Mar 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wizardsdata-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Mar 16, 2025 Python 3

File details

Details for the file wizardsdata-0.1.0.tar.gz.

File metadata

Download URL: wizardsdata-0.1.0.tar.gz
Upload date: Mar 16, 2025
Size: 21.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for wizardsdata-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`356b24d343d6a8cda910a243beac6dba1617fb38185645fb73c45f6d7267bb7a`
MD5	`3c9a935bb17b33562e94179e14c04821`
BLAKE2b-256	`7b669f4ac45d1bf115a9469e52bb63f7e101f46175e5339caca1b713e00e8890`

See more details on using hashes here.

File details

Details for the file wizardsdata-0.1.0-py3-none-any.whl.

File metadata

Download URL: wizardsdata-0.1.0-py3-none-any.whl
Upload date: Mar 16, 2025
Size: 19.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for wizardsdata-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fb307f292229d9eb659ab8e29930639dc702187e43af5c7353316c8149f7a699`
MD5	`9b7b889c7821b919d9e685704b94c544`
BLAKE2b-256	`c0ef933756d87d8275ed9f0485c0e00c97b9fa226df69104aa3a36c30f6fb2b7`

See more details on using hashes here.

wizardsdata 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WizardSData

DOCUMENTATION | EXAMPLES

¿How it works?

Let's analyze one of the examples: financial01.

Profile Creation

Prompt templates.

Installation

Features

Quick Start

Template Format

Profile Format

Configuration Parameters

Mandatory Parameters

Optional Parameters (with defaults)

Client Configuration

Advisor Configuration

Advanced Usage

Saving and Loading Configuration

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes