Groq QA is a CLI tool and Python library that generates question-answer pairs from text to aid in fine-tuning large language models (LLMs).

These details have not been verified by PyPI

Project description

Groq QA Generator

Groqqy
Q: “Would you tell me, please, which way I ought to go from here?”
A: “That depends a good deal on where you want to get to,” said the Cat.
— Alice's Adventures in Wonderland

Groq QA is a Python library that automates the creation of question-answer pairs from text, designed to aid in fine-tuning large language models (LLMs). Built on the Groq platform, it leverages powerful models like LLaMA 3 with 70 billion parameters, ideal for generating high-quality QA pairs. This tool streamlines the process of preparing custom datasets, helping improve LLM performance on specialized tasks with minimal manual effort. It’s particularly useful for fine-tuning models in research, education, and domain-specific applications.

✨ Features

	✨ Feature	📄 Description
✅	CLI Support	Use as a command-line tool.
✅	Python Library Support	Import directly in Python code.
✅	Automated QA Generation	Generate question-answer pairs from input text.
✅	Prompt Templates	Flexible question generation through prompt templates.
✅	Model Support	Supports advanced models like LLaMA 3.1 70B via the Groq API.
✅	Customizable Configurations	Configure via `config.json` or in Python code.

👨‍💻 About the Developer

Hey there! I’m Jordan, a Canadian network engineer with over a decade of experience, especially from my time in the fast-paced world of California startups. My focus has been on automating test systems aligned with company KPIs, making it easier for teams to make data-driven decisions.

Whether it’s tackling tough challenges, improving codebases, or working on innovative ideas, I’m always up for the task. Let’s connect and make things happen! ✌️

📄 Table of Contents

🐱 Groq QA Generator
✨ Features
👨‍💻 About the Developer
🚀 Quick Start
📦 Upgrading
⚙️ Using groq-qa
🛠 Configuration
- Directory Structure
- Default config.json
🔧 Customizing the Configuration
🐇 Input Data
- Sample Input Data
🤖 Models
🐍 Python Library
- Example
🧠 Technical Overview
🧪 Running Tests
🤝 How to Contribute
❓ FAQ
⚖️ License

🚀 Quick Start

Install the package via pip:
```
pip install groq-qa-generator
```
Set up the API key:
```
export GROQ_API_KEY=your_api_key_here
```
Run the groq-qa command with default settings (~/.groq_qa/config.json):
```
groq-qa
```
View the results in ~/.groq_qa/qa_output.txt.

📦 Upgrading

To ensure that you have the latest features, bug fixes, and improvements, it is recommended to periodically upgrade the groq_qa_generator package.

You can update groq_qa_generator to the latest version by running:

pip install --upgrade groq-qa-generator

⚙️ Using `groq-qa`

Setup the API Key

First, you need to acquire an API key. Sign up at the Groq website and following their instructions for generating a key.

Setting the Environment Variable

Once you have the key, export it as an environment variable using the following commands based on your operating system:

MacOS and Linux

export GROQ_API_KEY=gsk_example_1234567890abcdef

Windows:

set GROQ_API_KEY=gsk_example_1234567890abcdef

Command-Line Interface (CLI)

Once installed, the command groq-qa becomes available. By default, this command reads from default configuration located at ~/.groq_qa/config.json.

Here are a few examples of how to use the groq-qa command:

# Run with default config.json:
groq-qa 

# Output results in JSON format:
groq-qa --json 

 # Run with model, temperature, and json overrides:
groq-qa --model llama3-70b-8192 --temperature 0.7 --json

CLI Options:

--model: The default model to be used for generating QA pairs is defined in config.json. The default is set to llama3-70b-8192.
--temperature: Controls the randomness of the model's output. Lower values like 0.1 will result in more deterministic and focused outputs, while higher values will make the output more random. The default is set to 0.1.
--json: If this flag is included, the output will be saved in a JSON format. By default, the output is stored as a plain text file. The default is set to False.

🛠 Configuration

When you run the groq-qa command for the first time, a user-specific configuration directory (~/.groq_qa/) is automatically created. This directory contains all the necessary configuration files and templates for customizing input, prompts, and output.

Directory Structure

~/.groq_qa/
├── config.json
├── data
│   ├── alices_adventures_in_wonderland.txt
│   ├── sample_input_data.txt
│   └── prompts
│       ├── sample_question.txt
│       └── system_prompt.txt
├── qa_output.json
└── qa_output.txt

Default config.json

{
    "system_prompt": "system_prompt.txt",
    "sample_question": "sample_question.txt",
    "input_data": "sample_input_data.txt",
    "output_file": "qa_output.txt",
    "model": "llama3-70b-8192",
    "chunk_size": 512,
    "tokens_per_question": 60,
    "temperature": 0.1,
    "max_tokens": 1024
}

🔧 Customizing the Configuration

The ~/.groq_qa directory contains essential files that can be customized to suit your specific needs. This directory includes the following components:

config.json: This is the main configuration file where you can set various parameters for the QA generation process. You can customize settings such as:
- system_prompt: Specify the path to your custom system prompt file that defines how the model should behave.
- sample_question: Provide the path to a custom sample question file that helps guide the generation of questions.
- input_data: Set the path to your own text file from which you want to generate question-answer pairs.
- output_file: Define the path where the generated QA pairs will be saved.

Other configurable options include:

model: Select the model to be used for generating QA pairs (e.g., llama3-70b-8192).
chunk_size: Set the number of tokens for each text chunk (e.g., 512).
tokens_per_question: Specify the number of tokens allocated for each question (e.g., 60).
temperature: Control the randomness of the model's output (e.g., 0.1).
max_tokens: Define the maximum number of tokens the model can generate in the response (e.g., 1024).

By adjusting these files and settings, you can create a personalized environment for generating question-answer pairs that align with your specific use case.

🐇 Input Data

This project uses text data from Alice's Adventures in Wonderland by Lewis Carroll, sourced from Project Gutenberg. The full text is available in the data/alices_adventures_in_wonderland.txt file.

Sample Input Data

For demonstration purposes, a smaller sample of the full text is included in the data/sample_input_data.txt file. This file contains a portion of the main text, used to quickly test and generate question-answer pairs without processing the entire book.

🤖 Models

The groq_qa_generator currently supports the following models via the Groq API:

Model Name	Model ID	Developer	Context Window	Description	Documentation Link
LLaMA 70B	llama3-70b-8192	Meta	8,192 tokens	A large language model with 70 billion parameters, suitable for high-quality QA pair generation.	Model Card
LLaMA 3.1 70B	llama-3.1-70b-versatile	Meta	128k tokens	A versatile large language model suitable for diverse applications.	Model Card

Note: For optimal QA pair generation, it is recommended to use a larger model such as 70B, as its capacity helps ensure higher quality output. See Groq's supported models documentation for all options.

🐍 Python Library

In addition to CLI usage, the groq_qa_generator can be used directly in your Python project. Below is an example of how to configure and execute the question-answer generation process using a custom configuration:

Example

# main.py

from groq_qa_generator import groq_qa

# Define a custom configuration
custom_config = {
    "system_prompt": "~/custom_path/custom_system_prompt.txt",
    "sample_question": "~/custom_path/custom_sample_question.txt",
    "input_data": "~/custom_path/custom_input_data.txt",
    "output_file": "~/custom_path/qa_output.txt",
    "model": "llama3-70b-8192",
    "chunk_size": 512,
    "tokens_per_question": 60,
    "temperature": 0.1,
    "max_tokens": 1024
}

if __name__ == "__main__":
    qa_pairs = groq_qa.generate(custom_config)
    print(qa_pairs)

This allows you to integrate the functionality within any Python application easily.

🧠 Technical Overview

🔑 API Interaction:
- API Key Management: The API key is securely retrieved from environment variables to ensure safe access to the Groq API.
- Client Initialization: A Groq API client is initialized to enable communication with the service, providing access to powerful models like LLaMA 70B.
📄 Text Processing:
- Loading Prompts and Questions: The library includes methods to load sample questions and system prompts from specified file paths. These prompts are essential for guiding the Groq API's response.
- Generating Full Prompts: The system prompt and sample question are combined into a complete prompt for the Groq API.
🤖 QA Pair Generation:
- The core process involves taking a list of text chunks and generating question-answer pairs using the Groq API. It:
  - Loads the system prompt and sample question.
  - Iterates through each text chunk, creating a full prompt for the Groq API.
  - Retrieves the completion from the Groq API.
  - Streams the completion response and converts it into question-answer pairs.
  - Writes the generated QA pairs to the specified output file.

This approach enables the generation of question-answer pairs, leveraging the Groq API while maintaining flexibility through configuration settings.

🧪 Running Tests

To run the project's tests, you can use Poetry and pytest. Follow these steps:

📦 Install Poetry: If you haven't already, install Poetry using pip.
```
pip install poetry
```
🔧 Install Dependencies: Navigate to the project directory and install the dependencies.
```
cd groq-qa-generator
poetry install
```
🏃 Run Tests: Use pytest to run the tests.
```
poetry run pytest
```

The tests cover various components, including:

API interaction
Configuration loading
Tokenization
QA generation

🤝 How to Contribute

🍴 Fork the Repository: Click the "Fork" button at the top-right of the repository page to create your copy.
📥 Clone Your Fork: Clone the forked repository to your local machine.
```
git clone https://github.com/your-username/groq-qa-generator.git
```
🌿 Create a Branch: Use a descriptive name for your branch to clearly indicate the purpose of your changes. This helps maintain organization and clarity in the project.
```
git checkout -b feature/your-feature-name
```
🔧 Set Up the Environment: Use Poetry to install the project dependencies.
```
cd groq-qa-generator
poetry install
```
⚙️ Confirm the Environment: Verify that the virtual environment has been correctly set up and activated.
```
poetry shell
```
📦 List Installed Packages: Ensure that the dependencies have been installed correctly by listing the installed packages.
```
poetry show
```

❓ FAQ

📝 How do I use my own input data?

You can customize the input data by editing the input_data field in your config.json file or by passing a custom input file via the CLI. To use a custom file in the configuration:

{
    "input_data": "~/custom_path/my_input_data.txt"
}

📁 Where can I find the generated QA pairs?

The generated QA pairs are saved to the output_file defined in your config.json file. By default, it saves the output in qa_output.txt, located in your home directory’s .groq_qa folder (~/.groq_qa/qa_output.txt).

To change the output file name, edit the output_file field in your config.json file:

{
    "output_file": "~/custom_output_path/qa_custom_output.txt"
}

🛠 Can I modify the sample question or system prompt templates?

Yes, both the sample question and the system prompt can be modified. These templates are located in the prompts directory inside the ~/.groq_qa/ folder:

sample_question.txt: This file defines how sample questions should be structured.
system_prompt.txt: This file defines how the system should behave and guide the generation process. Feel free to edit these templates to suit your needs.

🔄 How do I override default configuration settings?

You can override the default configuration settings in two ways:

Edit the config.json file located in the ~/.groq_qa/ directory.
Pass command-line arguments to override specific settings, for example:

groq-qa --model llama3-70b-8192 --temperature 0.5

🎛 How can I increase the randomness of the output?

Increase the temperature value in the configuration or pass it as a command-line argument (e.g., --temperature 0.9).

🐍 Can I use this tool in a larger Python project?

Yes, groq_qa_generator can be used as a Python library within your project. Simply import the module and configure it to generate QA pairs programmatically.

⚖️ License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.2.1

Oct 12, 2024

1.2.0

Oct 12, 2024

1.1.0

Oct 9, 2024

1.0.1

Oct 9, 2024

This version

1.0.0

Oct 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

groq_qa_generator-1.0.0.tar.gz (77.7 kB view hashes)

Uploaded Oct 7, 2024 Source

Built Distribution

groq_qa_generator-1.0.0-py3-none-any.whl (77.4 kB view hashes)

Uploaded Oct 7, 2024 Python 3

Hashes for groq_qa_generator-1.0.0.tar.gz

Hashes for groq_qa_generator-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`cb47e83845f9cec809f9e26dd28b371e74b596f32205848a87cda8ca173d724f`
MD5	`de2b3d7e69af0d5e1a5b255db0823e8e`
BLAKE2b-256	`4671d8cf88d9ed714b23e5c20ec4a2ef0e76df63e6065b21a16e1759f51546be`

Hashes for groq_qa_generator-1.0.0-py3-none-any.whl

Hashes for groq_qa_generator-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dde2b0f4717b387873447116ebf91b598d4ab0a8370465af13bcd9af06988a46`
MD5	`f387bb362dfc04307b96a647a7d20891`
BLAKE2b-256	`a9fbcad19c763a331a06d24d148bb91b3a00a3633de4edcac2115164b08222a5`

groq-qa-generator 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers