Skip to main content

SynGen is a CLI tool that creates high-quality synthetic datasets using the Gemini API

Project description

SynGen

SynGen is a tool that creates high-quality synthetic datasets using the Gemini API. It analyzes Markdown documents to generate realistic and diverse examples for machine learning, software testing, and data analysis.

Features

  • Automated Q&A Generation: Extracts questions and answers from Markdown files using AI.
  • Customizable Question Count: Define the number of questions per document.
  • Multiple Runs: Process each document multiple times to generate varied outputs.
  • JSON Output Format: Saves results in a structured JSON file.

Installation

install it from pip

pip install sysgen

Set Up Environment Variables

Before running sysgen, set the API key in your terminal:

# Windows
set GEMINI_API_KEY=your_gemini_api_key_here

# Linux/Mac
export GEMINI_API_KEY=your_gemini_api_key_here

Usage

Run the script with the following command:

sysgen --md-folder path/to/md --output output.json --num-questions 50 --repeat 1

Arguments

  • --md-folder: Folder containing Markdown files (default: md)
  • --output: Output JSON file (default: output.json)
  • --num-questions: Number of questions per document (default: 100)
  • --repeat: Number of times to process each document (default: 1)

Output Format

The generated JSON follows this structure:

[
  {
    "data": [
      {"instruction": "Question here"},
      {"output": "Answer here"}
    ],
    "source_document": "filename.md",
    "run_number": 1
  }
]

Contributing

If you find a bug or have suggestions for improvement, feel free to open an issue or submit a pull request on GitHub.

How to Contribute

  1. Fork the Repository: Start by forking the project on GitHub.
  2. Clone the Repository: Clone it to your local machine using:
    git clone https://github.com//your-username/sysgen.git
    
  3. Create a Branch: Create a new branch for your changes:
    git checkout -b feature-branch-name
    
  4. Make Changes: Implement your improvements or bug fixes.
  5. Commit Your Changes: Write a clear commit message:
    git commit -m "Added feature XYZ"
    
  6. Push to GitHub: Push your changes:
    git push origin feature-branch-name
    
  7. Submit a Pull Request: Open a PR describing your changes.
  8. Review & Merge: Wait for review and approval before merging.

License

This project is licensed under the MIT License. See LICENSE for details.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sysgen-0.1.0.tar.gz (5.8 kB view details)

Uploaded Source

File details

Details for the file sysgen-0.1.0.tar.gz.

File metadata

  • Download URL: sysgen-0.1.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for sysgen-0.1.0.tar.gz
Algorithm Hash digest
SHA256 05ffda1d18b4133966bc7132d3210fd7cf76e52754d2d2ec16d5a2a405ba37f5
MD5 ee339c8811bf738b72ac249a06899c88
BLAKE2b-256 7165d55982d36960eae07a3b782e8835274062bee5e4eb8588e6f80a96adff44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page