A package for processing documents and generating questions and answers using LLMs on GPU and CPU.
Project description
FragenAntwortLLMGPU
FragenAntwortLLMGPU is a Python package for processing PDF documents and generating efficient Question & Answer (Q&A) pairs using LLM backends on CPU or GPU. The generated Q&A pairs can be used for LLM fine-tuning.
It supports:
- Mistral (GGUF via CTransformers) — local GGUF models
- Qwen (Hugging Face Transformers) — HF models like Qwen2.5 Instruct
Table of Contents
Installation
Install the package
pip install FragenAntwortLLMGPU
Notes on PyTorch / GPU
This project uses PyTorch. Install a PyTorch build that matches your system (CPU or your CUDA version).
If you already have PyTorch installed, you can skip this.
PyTorch install guide:
Optional backend dependencies
Qwen (Transformers backend)
pip install transformers accelerate
Mistral GGUF (CTransformers backend)
pip install ctransformers
Usage
Python example
from FragenAntwortLLMGPU import DocumentProcessor
processor = DocumentProcessor(
book_path="/path/to/your/book/", # directory containing the PDF
temp_folder="/path/to/temp/folder",
output_file="/path/to/output/QA.jsonl",
book_name="example.pdf",
start_page=9,
end_page=77,
gpu_layers=100,
number_Q_A="five", # written number: "one", "two", ...
target_information="foods and locations",
max_new_tokens=1000,
temperature=0.1,
context_length=2100,
max_tokens_chunk=800,
arbitrary_prompt="",
model="mistral", # default
)
processor.process_book()
processor.generate_prompts()
processor.save_to_jsonl()
Model selection
Default (Mistral GGUF via CTransformers)
The default backend is GGUF via CTransformers.
Example model source (GGUF):
Use:
processor = DocumentProcessor(..., model="mistral")
Qwen (Hugging Face Transformers)
Use:
processor = DocumentProcessor(
...,
model="qwen",
# optionally override the HF model id
hf_model_id="Qwen/Qwen2.5-7B-Instruct",
)
Features
- Extracts text from PDF documents
- Splits text into manageable chunks for processing
- Generates efficient Q&A pairs based on specific target information
- Supports custom prompts for question generation
- Runs on CPU and GPU (depending on backend and installation)
- Multilingual input: accepts PDF books in French, German, or English and generates Q&A pairs in English
Contributing
Contributions are welcome! Please fork the repository and submit pull requests.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Authors
Mehrdad Almasi, Lars Wieneke, and Demival Vasques Filho
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fragenantwortllmgpu-0.1.15.tar.gz.
File metadata
- Download URL: fragenantwortllmgpu-0.1.15.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6da82e6299363e01ff00dfe626271b05f228a3e22e097d94b519e62a6892ea2c
|
|
| MD5 |
a5c226bca494f05ef7ff2edb628127e0
|
|
| BLAKE2b-256 |
958acca842a61e881fb0de31a15dd9ae6952a53e76c5553a432213f5c709f119
|
File details
Details for the file fragenantwortllmgpu-0.1.15-py3-none-any.whl.
File metadata
- Download URL: fragenantwortllmgpu-0.1.15-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29dc040be293b85e837eba89ab51fc5a396815a9fc43bfd0d975a167eb8e624d
|
|
| MD5 |
a1ab325329d66bfaf4074696faaf6ae4
|
|
| BLAKE2b-256 |
3e7be84fa4c6b9fe17f388d0122b14f090ad96b4914a6159baca17c8da04f1a7
|