sysgen is a CLI tool that creates high-quality synthetic datasets using the Gemini API
Project description
Sysgen
Sysgen is a CLI tool that creates high-quality synthetic datasets using the Gemini API. It analyzes Markdown documents to generate realistic and diverse examples for machine learning, software testing, and data analysis.
Features
- Automated Q&A Generation: Extracts questions and answers from Markdown files using AI.
- Customizable Question Count: Define the number of questions per document.
- Multiple Runs: Process each document multiple times to generate varied outputs.
- JSON Output Format: Saves results in a structured JSON file.
Installation
install it from pip
pip install sysgen
Set Up Environment Variables
Before running sysgen, set the API key in your terminal:
# Windows
set GEMINI_API_KEY=your_gemini_api_key_here
# Linux/Mac
export GEMINI_API_KEY=your_gemini_api_key_here
Usage
Run the script with the following command:
sysgen --md-folder path/to/md --output output.json --num-questions 50 --repeat 1
Arguments
--md-folder: Folder containing Markdown files (default:md)--output: Output JSON file (default:output.json)--num-questions: Number of questions per document (default:100)--repeat: Number of times to process each document (default:1)
Output Format
The generated JSON follows this structure:
[
{
"data": [
{"instruction": "Question here"},
{"output": "Answer here"}
],
"source_document": "filename.md",
"run_number": 1
}
]
Contributing
If you find a bug or have suggestions for improvement, feel free to open an issue or submit a pull request on GitHub.
How to Contribute
- Fork the Repository: Start by forking the project on GitHub.
- Clone the Repository: Clone it to your local machine using:
git clone https://github.com//your-username/sysgen.git
- Create a Branch: Create a new branch for your changes:
git checkout -b feature-branch-name
- Make Changes: Implement your improvements or bug fixes.
- Commit Your Changes: Write a clear commit message:
git commit -m "Added feature XYZ"
- Push to GitHub: Push your changes:
git push origin feature-branch-name
- Submit a Pull Request: Open a PR describing your changes.
- Review & Merge: Wait for review and approval before merging.
License
This project is licensed under the MIT License. See LICENSE for details.
Contact
- Author: Adhishtanaka
- Email: kulasoooriyaa@gmail.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sysgen-0.1.1.tar.gz.
File metadata
- Download URL: sysgen-0.1.1.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e81c818c591a12bd93679b8d5fa5e4845213a197752d6d29476d6cccba5f5bf
|
|
| MD5 |
f734279b47b40803b552ae381a268358
|
|
| BLAKE2b-256 |
d0b36f1dcbef52a5eca824af53575bcac8d18ee138d16d467bbf27fa7b6e18d1
|