Skip to main content

Generate synthetic datasets from natural language using OpenAI GPT

Project description

🧬Dataset Generator

Generate synthetic datasets from natural language descriptions using Google's Gemini 1.5 Flash model. Ideal for data science, machine learning prototyping, and testing workflows with customizable, structured synthetic data.


✨ Features

  • ⚡ Powered by Gemini 1.5 Flash (fast + cost-effective)
  • 🧠 Natural language prompt → structured data
  • 📦 Output in pandas, CSV, and JSON
  • 🧪 Optional edge case injection for testing
  • 🧾 Save datasets to local disk
  • 🔐 Easy API key setup using .env

📦 Installation

Install from PyPI:

pip install datafaker-ai 0.1.5



git clone https://github.com/ahsanraza1457/deepfaker_ai.git
cd deepfaker_ai
pip install -r requirements.txt

🔐 Setup
Option 1: Using .env (for GitHub users)
Create a .env file in the root directory:
GEMINI_API_KEY=your_google_generativeai_api_key
from generator import generate_dataset

df = generate_dataset(
    description="Customer name, email, age, and signup date",
    num_samples=50
)

print(df.head())
Option 2: Pass Directly (for PyPI users)
df = generate_dataset(
    description="user profile data",
    num_samples=10,
    api_key="your_google_generativeai_api_key"
)



Save Output as CSV or JSON
generate_dataset(
    description="IoT device logs with timestamp, device_id, temperature",
    num_samples=100,
    save_as='csv'  # Options: 'csv', 'json', 'both'
)



🗂 Project Structure
├── generator/
│   ├── __init__.py
│   ├── generator.py           # Main interface   ├── formatter.py           # Formats model output   ├── prompts.py             # Builds prompt from description   ├── edge_case_handler.py   # Injects edge cases
│
├── .env                      
├── README.md
├── requirements.txt
└── setup.py / pyproject.toml  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafaker_ai-0.1.6.tar.gz (2.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datafaker_ai-0.1.6-py3-none-any.whl (2.2 kB view details)

Uploaded Python 3

File details

Details for the file datafaker_ai-0.1.6.tar.gz.

File metadata

  • Download URL: datafaker_ai-0.1.6.tar.gz
  • Upload date:
  • Size: 2.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for datafaker_ai-0.1.6.tar.gz
Algorithm Hash digest
SHA256 f005860d71d79d28fd346c5a66a2b37217db0c9cefbf1d4c37f56f7fbf6dd40e
MD5 2126029fea5f6bae4cb20229ba64ce83
BLAKE2b-256 68f9e65f476d836ce66f0813847819a617bcd7b6184129a56a4bc94f4ea92b9e

See more details on using hashes here.

File details

Details for the file datafaker_ai-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: datafaker_ai-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 2.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for datafaker_ai-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 953b3f763f74d530f3ea74c78632949af146814bee61bce4377c3355d19531f1
MD5 818958d5440f643c8832345a5b282c46
BLAKE2b-256 69f2b4c6a0af763cc22646f55190ef385815abba72a827d9565dbbf24294df79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page