Skip to main content

sysgen is a CLI tool that creates high-quality synthetic datasets using the Gemini API

Project description

Sysgen

Sysgen is a CLI tool that creates high-quality synthetic datasets using the Gemini API. It analyzes Markdown documents to generate realistic and diverse examples for machine learning, software testing, and data analysis.

Features

  • Automated Q&A Generation: Extracts questions and answers from Markdown files using AI.
  • Customizable Question Count: Define the number of questions per document.
  • Multiple Runs: Process each document multiple times to generate varied outputs.
  • JSON Output Format: Saves results in a structured JSON file.

Installation

install it from pip

pip install sysgen

Set Up Environment Variables

Before running sysgen, set the API key in your terminal:

# Windows
set GEMINI_API_KEY=your_gemini_api_key_here

# Linux/Mac
export GEMINI_API_KEY=your_gemini_api_key_here

Usage

Run the script with the following command:

sysgen --md-folder path/to/md --output output.json --num-questions 50 --repeat 1

Arguments

  • --md-folder: Folder containing Markdown files (default: md)
  • --output: Output JSON file (default: output.json)
  • --num-questions: Number of questions per document (default: 100)
  • --repeat: Number of times to process each document (default: 1)

Output Format

The generated JSON follows this structure:

[
  {
    "data": [
      {"instruction": "Question here"},
      {"output": "Answer here"}
    ],
    "source_document": "filename.md",
    "run_number": 1
  }
]

Contributing

If you find a bug or have suggestions for improvement, feel free to open an issue or submit a pull request on GitHub.

How to Contribute

  1. Fork the Repository: Start by forking the project on GitHub.
  2. Clone the Repository: Clone it to your local machine using:
    git clone https://github.com//your-username/sysgen.git
    
  3. Create a Branch: Create a new branch for your changes:
    git checkout -b feature-branch-name
    
  4. Make Changes: Implement your improvements or bug fixes.
  5. Commit Your Changes: Write a clear commit message:
    git commit -m "Added feature XYZ"
    
  6. Push to GitHub: Push your changes:
    git push origin feature-branch-name
    
  7. Submit a Pull Request: Open a PR describing your changes.
  8. Review & Merge: Wait for review and approval before merging.

License

This project is licensed under the MIT License. See LICENSE for details.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sysgen-0.1.1.tar.gz (6.2 kB view details)

Uploaded Source

File details

Details for the file sysgen-0.1.1.tar.gz.

File metadata

  • Download URL: sysgen-0.1.1.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for sysgen-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4e81c818c591a12bd93679b8d5fa5e4845213a197752d6d29476d6cccba5f5bf
MD5 f734279b47b40803b552ae381a268358
BLAKE2b-256 d0b36f1dcbef52a5eca824af53575bcac8d18ee138d16d467bbf27fa7b6e18d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page