Wrapper for simplified use of Llama2 GGUF quantized models.
Project description
gguf_llama
Provides a LlamaAI class with Python interface for generating text using Llama models.
Features
- Load Llama models and tokenizers automatically from gguf file
- Generate text completions for prompts
- Automatically adjust model size to fit longer prompts up to a specific limit
- Convenient methods for tokenizing and untokenizing text
- Fix text formatting issues before generating
Usage
from llama_ai import LlamaAI
ai = LlamaAI("my_model.gguf", max_tokens=500, max_input_tokens=100)"
Generate text by calling infer():
text = ai.infer("Once upon a time")
print(text)"
Adjust model tokens to fit longer prompts:
"big_prompt = "..." # prompt longer than max input tokens
text = ai.infer(big_prompt, max_tokens_if_needed=2000)"
Installation
pip install gguf_llama
Documentation
See the API documentation for full details on classes and methods.
Contributing
Contributions are welcome! Open an issue or PR to improve gguf_llama.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gguf_llama-0.0.12.tar.gz
(4.5 kB
view hashes)
Built Distribution
Close
Hashes for gguf_llama-0.0.12-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fc1aef960ce89fd95fc12cbdcfdbc6ce078dd85858e66897fee84790cf2b064 |
|
MD5 | 5564689ac33659c40188a7afc431888c |
|
BLAKE2b-256 | 45543a1d96782c1e7f66e5efdc986aa8057145c876c1e231905c9b2ea53357bf |