Skip to main content

A framework for generating synthetic data using LLMs with techniques to ensure diversity and reduce bias

Project description

SynthAI: Synthetic Data Generation Framework

PyPI version Python Version License

SynthAI is a lightweight framework for generating high-quality synthetic data using LLMs with techniques to ensure diversity and reduce bias.

Features

  • 🤖 LLM-Powered Generation: Create realistic synthetic data using language models
  • 🧩 Domain Adapters: Specialized components for different data domains (text, tabular, time-series)
  • 🔄 Diversity Enhancement: Built-in techniques to increase diversity in generated data
  • ⚖️ Bias Reduction: Methods to detect and mitigate bias in synthetic datasets
  • 📊 Quality Evaluation: Tools to measure the quality and utility of generated data
  • 🚀 Resource Efficiency: Optimized to work with lighter models and minimal compute requirements

Installation

pip install synthai

Quick Start

from synthai import SyntheticDataGenerator
from synthai.generators import TextGenerator
from synthai.evaluators import DiversityEvaluator

# Initialize a generator
generator = SyntheticDataGenerator(
    generator_type=TextGenerator(model="distilgpt2"),
    domain="customer_reviews"
)

# Generate synthetic data
synthetic_data = generator.generate(
    num_samples=100,
    prompt_template="Write a {sentiment} review for a {product_type}",
    parameters={
        "sentiment": ["positive", "negative", "neutral"],
        "product_type": ["smartphone", "laptop", "headphones"]
    }
)

# Evaluate diversity of the generated data
evaluator = DiversityEvaluator()
diversity_score = evaluator.evaluate(synthetic_data)
print(f"Diversity score: {diversity_score}")

# Save the generated data
generator.save_data(synthetic_data, "synthetic_reviews.csv")

Documentation

For full documentation, visit synthai.readthedocs.io.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Acknowledgments

Special thanks to the open-source community and the advancements in LLM technology that make this library possible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthai_generator-0.1.0.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthai_generator-0.1.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file synthai_generator-0.1.0.tar.gz.

File metadata

  • Download URL: synthai_generator-0.1.0.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for synthai_generator-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b65fba5fabaa590b3eb83836872dcc559679b2cac718bf41fb7594b09136d753
MD5 9d54dc70cac6110c694e4cfe722e7645
BLAKE2b-256 be1efabf3db4b0b55573e85a72db7bcec12a5b400e9b7dc0b5df6632bafec2ca

See more details on using hashes here.

File details

Details for the file synthai_generator-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for synthai_generator-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0005ae3f0dfb8252cb9672ac8a542b8c71959764d388ba361dbcd54dc4cff607
MD5 c3817f73fd3ac26dd98840d1a4a2f5e4
BLAKE2b-256 87f9c0f1faacf480076bc114ad9e70d61cf4d508401ecdb037685b52f2102b0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page