A simple and efficient python library for fast inference of GGUF Large Language Models.
Project description
ALLM
ALLM is a Python library designed for fast inference of GGUF (Generic Global Unsupervised Features) Large Language Models (LLMs) on both CPU and GPU. It provides a convenient interface for loading pre-trained GGUF models and performing inference using them. This library is ideal for applications where quick response times are crucial, such as chatbots, text generation, and more.
Features
- Efficient Inference: ALLM leverages the power of GGUF models to provide fast and accurate inference.
- CPU and GPU Support: The library is optimized for both CPU and GPU, allowing you to choose the best hardware for your application.
- Simple Interface: With a straightforward command line support, you can easily load models and perform inference with just a single command.
- Flexible Configuration: Customize inference settings such as temperature and model path to suit your needs.
Installation
You can install ALLM using pip:
pip install allm
Usage
You can start inference with a simple 'allm-run' command. The command takes name or path, temperature(optional), max new tokens(optional) and additional model kwargs(optional) as arguments.
allm-run --name model_name_or_path
API
You can initiate the inference API by simply using the 'allm-serve' command. This command launches the API server on the default host, 127.0.0.1:5000. If you prefer to run the API server on a different port and host, you have the option to customize the apiconfig.txt file within your model directory.
allm-serve
==========================================================================================================================================
ALLM RAG
Local RAG Inference
To initiate local RAG inference, begin by ingesting your documents into the vector database using the allm-createagent command:
allm-createagent --doc "document_path"
After successfully ingesting the document, you can start the local RAG inference with the allm-agentchat command:
allm-agentchat --name 'model_name_or_path'
Alternatively, you can also initiate RAG inference on the API server using the allm-agentapi command:
allm-agentapi
Supported Model names
Llama2, llama, llama2_chat, Llama_chat, Mistral, Mistral_instruct
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file ALLMDEV-1.2.4-py3-none-any.whl
.
File metadata
- Download URL: ALLMDEV-1.2.4-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36ba736eb75cd2d7d3439ef530fcfdfe38a2255aae8937e3107089be07b56384 |
|
MD5 | 353b0e6eb74cb0516866a2bcbcc0902a |
|
BLAKE2b-256 | edc2a798fbf4434cd80fa976070aa87bb03048471cce11583155b4082e18c3f4 |