Skip to main content

Generate synthetic datasets from natural language using Gemini 1.5 Flash

Project description

🧬 Gemini Dataset Generator

Generate synthetic datasets from natural language descriptions using Google's Gemini 1.5 Flash model. Ideal for data science, machine learning prototyping, and testing workflows with customizable, structured synthetic data.


✨ Features

  • ⚡ Powered by Gemini 1.5 Flash (fast + cost-effective)
  • 🧠 Natural language prompt → structured data
  • 📦 Output in pandas, CSV, and JSON
  • 🧪 Optional edge case injection for testing
  • 🧾 Save datasets to local disk
  • 🔐 Easy API key setup using .env

📦 Installation

Install from PyPI:

pip install gemini-dataset-generator


Or clone manually:


git clone https://github.com/ahsanraza1457/deepfaker_ai.git
cd deepfaker_ai
pip install -r requirements.txt

🔐 Setup
Create a .env file in the root directory of your project
GEMINI_API_KEY=your_google_generativeai_api_key


🚀 Usage
from generator import generate_dataset

df = generate_dataset(
    description="Customer name, email, age, and signup date",
    num_samples=50
)

print(df.head())



Save Output as CSV or JSON
generate_dataset(
    description="IoT device logs with timestamp, device_id, temperature",
    num_samples=100,
    save_as='csv'  # Options: 'csv', 'json', 'both'
)



🗂 Project Structure
├── generator/
│   ├── __init__.py
│   ├── generator.py           # Main interface   ├── formatter.py           # Formats model output   ├── prompts.py             # Builds prompt from description   ├── edge_case_handler.py   # Injects edge cases
│
├── .env                      
├── README.md
├── requirements.txt
└── setup.py / pyproject.toml  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafaker_ai-0.1.3.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datafaker_ai-0.1.3-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file datafaker_ai-0.1.3.tar.gz.

File metadata

  • Download URL: datafaker_ai-0.1.3.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for datafaker_ai-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5c1e7d8c302f44063899d92af5bbc64531afc7a918d9aa26c32e41693d164255
MD5 3a3ca6da1c35dd0998022cb6bdb903f1
BLAKE2b-256 b7ddc0ed21a1115e3fcbf9fb73931b7d1c972e1de29d92e3e8474c2f82920d7d

See more details on using hashes here.

File details

Details for the file datafaker_ai-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: datafaker_ai-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for datafaker_ai-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dc5597475d77c777c7264e17a97b294623d121a6bef334f761451a8af20eb2fe
MD5 00edd2e95decef70f7dccf2ad386a57c
BLAKE2b-256 a2e5dd4f3026621c2bc4a1e4cba229a2471b6b7156cb09b58e0cb5f40bbd0736

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page