Skip to main content

Generate synthetic datasets from natural language using Gemini 1.5 Flash

Project description

🧬Dataset Generator

Generate synthetic datasets from natural language descriptions using Google's Gemini 1.5 Flash model. Ideal for data science, machine learning prototyping, and testing workflows with customizable, structured synthetic data.


✨ Features

  • ⚡ Powered by Gemini 1.5 Flash (fast + cost-effective)
  • 🧠 Natural language prompt → structured data
  • 📦 Output in pandas, CSV, and JSON
  • 🧪 Optional edge case injection for testing
  • 🧾 Save datasets to local disk
  • 🔐 Easy API key setup using .env

📦 Installation

Install from PyPI:

pip install datafaker-ai 0.1.5



git clone https://github.com/ahsanraza1457/deepfaker_ai.git
cd deepfaker_ai
pip install -r requirements.txt

🔐 Setup
Option 1: Using .env (for GitHub users)
Create a .env file in the root directory:
GEMINI_API_KEY=your_google_generativeai_api_key
from generator import generate_dataset

df = generate_dataset(
    description="Customer name, email, age, and signup date",
    num_samples=50
)

print(df.head())
Option 2: Pass Directly (for PyPI users)
df = generate_dataset(
    description="user profile data",
    num_samples=10,
    api_key="your_google_generativeai_api_key"
)



Save Output as CSV or JSON
generate_dataset(
    description="IoT device logs with timestamp, device_id, temperature",
    num_samples=100,
    save_as='csv'  # Options: 'csv', 'json', 'both'
)



🗂 Project Structure
├── generator/
│   ├── __init__.py
│   ├── generator.py           # Main interface   ├── formatter.py           # Formats model output   ├── prompts.py             # Builds prompt from description   ├── edge_case_handler.py   # Injects edge cases
│
├── .env                      
├── README.md
├── requirements.txt
└── setup.py / pyproject.toml  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafaker_ai-0.1.5.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datafaker_ai-0.1.5-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file datafaker_ai-0.1.5.tar.gz.

File metadata

  • Download URL: datafaker_ai-0.1.5.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for datafaker_ai-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f3548137b5f4784c1aee8f8fc128475dbb024e86622ed34d00ce8e079c7729ee
MD5 8b9c6ffaba3950f5081300fe6f5b56a3
BLAKE2b-256 625a62381cdd6d03258f483f75f387407b08e06d335eeccdeea21883da5a4240

See more details on using hashes here.

File details

Details for the file datafaker_ai-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: datafaker_ai-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for datafaker_ai-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7902109552c1a293d39f6533cc0525292d81b1bbc74034adc9aeb95167ce2d4e
MD5 007196d585dab759af6e43f97c28118e
BLAKE2b-256 0d72828a5191634c360412ded058ce1993e12aa1141dbf42d16dd85cd64df008

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page