A framework for generating synthetic data using LLMs with techniques to ensure diversity and reduce bias
Project description
SynthAI: Synthetic Data Generation Framework
SynthAI is a lightweight framework for generating high-quality synthetic data using LLMs with techniques to ensure diversity and reduce bias.
Features
- 🤖 LLM-Powered Generation: Create realistic synthetic data using language models
- 🧩 Domain Adapters: Specialized components for different data domains (text, tabular, time-series)
- 🔄 Diversity Enhancement: Built-in techniques to increase diversity in generated data
- ⚖️ Bias Reduction: Methods to detect and mitigate bias in synthetic datasets
- 📊 Quality Evaluation: Tools to measure the quality and utility of generated data
- 🚀 Resource Efficiency: Optimized to work with lighter models and minimal compute requirements
Installation
pip install synthai
Quick Start
from synthai import SyntheticDataGenerator
from synthai.generators import TextGenerator
from synthai.evaluators import DiversityEvaluator
# Initialize a generator
generator = SyntheticDataGenerator(
generator_type=TextGenerator(model="distilgpt2"),
domain="customer_reviews"
)
# Generate synthetic data
synthetic_data = generator.generate(
num_samples=100,
prompt_template="Write a {sentiment} review for a {product_type}",
parameters={
"sentiment": ["positive", "negative", "neutral"],
"product_type": ["smartphone", "laptop", "headphones"]
}
)
# Evaluate diversity of the generated data
evaluator = DiversityEvaluator()
diversity_score = evaluator.evaluate(synthetic_data)
print(f"Diversity score: {diversity_score}")
# Save the generated data
generator.save_data(synthetic_data, "synthetic_reviews.csv")
Documentation
For full documentation, visit synthai.readthedocs.io.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
- Biswanath Roul - GitHub
Acknowledgments
Special thanks to the open-source community and the advancements in LLM technology that make this library possible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthai_generator-0.1.0.tar.gz.
File metadata
- Download URL: synthai_generator-0.1.0.tar.gz
- Upload date:
- Size: 20.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b65fba5fabaa590b3eb83836872dcc559679b2cac718bf41fb7594b09136d753
|
|
| MD5 |
9d54dc70cac6110c694e4cfe722e7645
|
|
| BLAKE2b-256 |
be1efabf3db4b0b55573e85a72db7bcec12a5b400e9b7dc0b5df6632bafec2ca
|
File details
Details for the file synthai_generator-0.1.0-py3-none-any.whl.
File metadata
- Download URL: synthai_generator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0005ae3f0dfb8252cb9672ac8a542b8c71959764d388ba361dbcd54dc4cff607
|
|
| MD5 |
c3817f73fd3ac26dd98840d1a4a2f5e4
|
|
| BLAKE2b-256 |
87f9c0f1faacf480076bc114ad9e70d61cf4d508401ecdb037685b52f2102b0f
|