Zero-API browser-based AI dataset generation. No API keys needed — just a browser and a prompt.
Project description
aigen-cli
Zero-API, browser-based AI dataset generation.
Turn any web AI (Gemini, ChatGPT, Claude, Perplexity) into a batch generation engine. No API keys. No per-call costs. Just your browser and a config file.
$20/mo subscription → unlimited generations
vs
$0.001–$0.024 per API call → ~$5 for 200 items
Install
Via NPM (no Python setup needed)
npx aigen-cli auth gemini
npx aigen-cli run aigen.yaml
# Or install globally:
npm install -g aigen-cli
aigen auth gemini
The NPM wrapper auto-installs the Python package on first run.
Via pip
pip install aigen-cli
# With semantic dedup + quality scoring:
pip install aigen-cli[ml]
# With Playwright backend:
pip install aigen-cli[playwright]
# Everything:
pip install aigen-cli[all]
Quick Start
1. Authenticate once
aigen auth gemini
A Chrome window opens — sign in, then close it. Your session is saved encrypted.
2. Create a config
aigen init
Interactive wizard generates aigen.yaml.
3. Generate
aigen run aigen.yaml
Headless browsers start, data flows into output/dataset.json.
Commands
| Command | Description |
|---|---|
aigen auth <platform> |
One-time login. Saves encrypted browser session. Platforms: gemini, chatgpt, claude, perplexity |
aigen run <config.yaml> |
Primary command. Auth-first headless generation from config |
aigen generate |
Legacy generation via an already-running Chrome debug session |
aigen init |
Interactive wizard to create a generation config |
aigen doctor |
System health check (Chrome, Python, keyring, optional deps) |
aigen status |
Show generation stats for an output directory |
aigen packs |
List built-in domain config packs (medical, legal, ecommerce, code) |
aigen push-hf |
Push a completed dataset to HuggingFace Hub |
aigen sessions |
List, export, or import browser sessions |
aigen schedule |
Add/remove/run cron-scheduled generation jobs |
aigen mcp |
Run as an MCP server for AI agent integration |
Run aigen --help or aigen <command> --help for full options.
Config File
project: "My Dataset"
description: "Short description"
target: 100 # Total items to generate
batch_size: 5 # Items per browser request
agents: 2 # Parallel browser tabs
platforms:
- gemini
- chatgpt
schema:
fields: [question_text, expected_answer, difficulty, marks]
required: [question_text, expected_answer, difficulty]
topic_pool:
- chapter: "Chapter 1"
topic: "Algebra"
question_type: short_answer
difficulty: easy
marks: 3
output:
format: json # json | csv | jsonl
path: output
filename: dataset.json
# Optional: push to HuggingFace after generation
huggingface:
repo_id: "your-username/dataset-name"
private: false
# Optional: quality + compliance
quality:
enabled: true # requires pip install aigen-cli[ml]
mode: heuristic
threshold: 6.0
compliance:
pii_detection: true
auto_redact: true
Domain Packs
Ready-made configs for common use cases:
aigen packs # list available packs
aigen generate --pack medical_mcq # run a built-in pack
Built-in packs: medical_mcq, legal_qa, ecommerce, code_review
Architecture
┌─────────────────────────────────────────────────────────┐
│ aigen-cli │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Gemini │ │ ChatGPT │ │ Claude │ │Perplexity│ │
│ │ Tab │ │ Tab │ │ Tab │ │ Tab │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ └──────────────┴─────────────┴──────────────┘ │
│ │ │
│ ┌───────────▼───────────┐ │
│ │ BrowserPool │ │
│ │ (auth-first agents) │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌───────────▼───────────┐ │
│ │ GenerationEngine │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ │ │ │ │
│ ┌──▼──┐ ┌───────────┐ ┌▼────────┐ ┌────────▼──┐ │
│ │Parse│ │ Validate │ │ Dedup │ │ Output │ │
│ │JSON │ │ (schema) │ │ Engine │ │ Writers │ │
│ └─────┘ └───────────┘ └─────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────┘
Why This Exists
API costs scale linearly with usage. Generating 1000 items via API can cost $20–50. The same models via web interface are included in a flat $20/month subscription.
aigen-cli automates the web interface so you get:
- Unlimited generations — rate-limited by platform anti-bot, not your billing
- Zero API keys — just log in once like a normal user
- Full transparency — every raw response is saved for audit
- No vendor lock-in — swap platforms instantly via config
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aigen_cli-1.0.0.tar.gz.
File metadata
- Download URL: aigen_cli-1.0.0.tar.gz
- Upload date:
- Size: 69.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8746d9cb335adb81b97e87e0095d3a64a16cb6eb7f68d6d64c6122df23c2b74
|
|
| MD5 |
eedb0d81897385dda31dd8e5e8ad9f3b
|
|
| BLAKE2b-256 |
29d1b7ae95437d3afae372d442adb4d5bbe9249dbe7ea7e13fd35ada0a8f4683
|
File details
Details for the file aigen_cli-1.0.0-py3-none-any.whl.
File metadata
- Download URL: aigen_cli-1.0.0-py3-none-any.whl
- Upload date:
- Size: 80.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0360e03bae8b1f3435e195c0c7e0e3d91afed2e303352c92c69c4692ec9ce2f5
|
|
| MD5 |
739f2b190e273686418f9699feb42e1c
|
|
| BLAKE2b-256 |
c371ffc537c63e724a8855d068f463c0b4823d9c3879eeb07eb7653cef5b2bf5
|