Generate synthetic datasets from natural language using Gemini 1.5 Flash
Project description
🧬Dataset Generator
Generate synthetic datasets from natural language descriptions using Google's Gemini 1.5 Flash model. Ideal for data science, machine learning prototyping, and testing workflows with customizable, structured synthetic data.
✨ Features
- ⚡ Powered by Gemini 1.5 Flash (fast + cost-effective)
- 🧠 Natural language prompt → structured data
- 📦 Output in
pandas,CSV, andJSON - 🧪 Optional edge case injection for testing
- 🧾 Save datasets to local disk
- 🔐 Easy API key setup using
.env
📦 Installation
Install from PyPI:
pip install datafaker-ai 0.1.5
git clone https://github.com/ahsanraza1457/deepfaker_ai.git
cd deepfaker_ai
pip install -r requirements.txt
🔐 Setup
Option 1: Using .env (for GitHub users)
Create a .env file in the root directory:
GEMINI_API_KEY=your_google_generativeai_api_key
from generator import generate_dataset
df = generate_dataset(
description="Customer name, email, age, and signup date",
num_samples=50
)
print(df.head())
Option 2: Pass Directly (for PyPI users)
df = generate_dataset(
description="user profile data",
num_samples=10,
api_key="your_google_generativeai_api_key"
)
Save Output as CSV or JSON
generate_dataset(
description="IoT device logs with timestamp, device_id, temperature",
num_samples=100,
save_as='csv' # Options: 'csv', 'json', 'both'
)
🗂 Project Structure
├── generator/
│ ├── __init__.py
│ ├── generator.py # Main interface
│ ├── formatter.py # Formats model output
│ ├── prompts.py # Builds prompt from description
│ ├── edge_case_handler.py # Injects edge cases
│
├── .env
├── README.md
├── requirements.txt
└── setup.py / pyproject.toml
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datafaker_ai-0.1.5.tar.gz.
File metadata
- Download URL: datafaker_ai-0.1.5.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3548137b5f4784c1aee8f8fc128475dbb024e86622ed34d00ce8e079c7729ee
|
|
| MD5 |
8b9c6ffaba3950f5081300fe6f5b56a3
|
|
| BLAKE2b-256 |
625a62381cdd6d03258f483f75f387407b08e06d335eeccdeea21883da5a4240
|
File details
Details for the file datafaker_ai-0.1.5-py3-none-any.whl.
File metadata
- Download URL: datafaker_ai-0.1.5-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7902109552c1a293d39f6533cc0525292d81b1bbc74034adc9aeb95167ce2d4e
|
|
| MD5 |
007196d585dab759af6e43f97c28118e
|
|
| BLAKE2b-256 |
0d72828a5191634c360412ded058ce1993e12aa1141dbf42d16dd85cd64df008
|