Skip to main content

A simple and efficient python library for fast inference of GGUF Large Language Models.

Project description

ALLM

ALLM is a Python library designed for fast inference of GGUF (Generic Global Unsupervised Features) Large Language Models (LLMs) on both CPU and GPU. It provides a convenient interface for loading pre-trained GGUF models and performing inference using them. This library is ideal for applications where quick response times are crucial, such as chatbots, text generation, and more.

Features

  • Efficient Inference: ALLM leverages the power of GGUF models to provide fast and accurate inference.
  • CPU and GPU Support: The library is optimized for both CPU and GPU, allowing you to choose the best hardware for your application.
  • Simple Interface: With a straightforward command line support, you can easily load models and perform inference with just a single command.
  • Flexible Configuration: Customize inference settings such as temperature and model path to suit your needs.

Installation

You can install ALLM using pip:

pip install allm

Usage

You can start inference with a simple 'allm-run' command. The command takes name or path, temperature(optional), max new tokens(optional) and additional model kwargs(optional) as arguments.

allm-run --name model_name_or_path

API

You can initiate the inference API by simply using the 'allm-serve' command. This command launches the API server on the default host, 127.0.0.1:5000. If you prefer to run the API server on a different port and host, you have the option to customize the apiconfig.txt file within your model directory.

allm-serve

ALLM AGENTS

Local Agent Inference

To create local agent, begin by loading your knowledge documents into the database using the allm-newagent command and specifying the agent name:

allm-newagent --doc "document_path" --agent agent_name

or

allm-newagent --dir "directory containing files to be ingested" --agent agent_name

After agent is created successfully with your knowledge document, you can start the local agent chat with the allm-agentchat command:

allm-agentchat --agent agent name

After your agents are created you can also initiate agent-specific API server using the allm-agentapi command:

allm-agentapi --agent agent name

You can also add additional documents to your existing agents by using the allm-updateagent command:

allm-updateagent --doc "document path" --agent agentname

##Supported Cloud models.

ALLM supports the Generative LLMs on VertexAI, including Gemini-1.5 pro and AzureOpenAi models. You can start local inference of cloud based models using the following command:

allm-run-vertex --projectid Id_of_your_GCP_project --region location_of_your_cloud_server

or

allm-run-azure --key key --version version --endpoint https://{your_endpoint}.openai.azure.com --model model_name

ALLM supports the local config based inference of Generative LLMs on VertexAI, including Gemini-1.5 pro and AzureOpenAi models. You can manually create a json confi file or ALLM will create one for you and start local inference of cloud based models using the following command:

allm-run-vertex

Note that for the above command to work, config file needs to have all the necessary parameters set. This can be achieved by running thr full command including CLI arguments once, and then using the shortened command

Same procedure can be followed for azure.

allm-run-azure

You can also have a custom agent working with your cloud deployed model using the following command. It is important to note that before this step, agent should be created using the commands in the AGENTS section above.

allm-agentchat-vertex --projectid Id_of_your_GCP_project --region location_of_your_cloud_server --agent agent_name

or

allm-run-azure --key key --version version --endpoint https://{your_endpoint}.openai.azure.com --model model_name --agent agentname

model_name is an optional parameter in both vertex and azure, if not mentioned, inference will work on gemini-1.0-pro-002 for vertex and gpt-35-turbo for OpenAI by default.

Also, have an api config file ready, the following commands can be used:

allm-agentchat-vertex --agent agent_name

and

allm-agentchat-azure --agent agent_name

ALLM also supports inferencing of cloud model based agents on API

allm-agentapi-vertex --projectid Id_of_your_GCP_project --region location_of_your_cloud_server --agent agent_name

or

allm-agentapi-vertex --agent agent_name

For Azure,

allm-run-azure --key key --version version --endpoint https://{your_endpoint}.openai.azure.com --model model_name --agent agentname

or

allm-agentapi-vertex --agent agent_name

#ALLM-Enterprise You can launch the UI with the following command:

allm-launch

Supported Model names

Llama3, Llama2, llama, llama2_chat, Llama_chat, Mistral, Mistral_instruct

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ALLMDEV-1.4.1-py3-none-any.whl (217.1 kB view details)

Uploaded Python 3

File details

Details for the file ALLMDEV-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: ALLMDEV-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 217.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for ALLMDEV-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c8e6f0e6cc3be4fa0230b312085dcb1857beb8ad6afc213bb9880f9d1ff9b9a
MD5 9f946c12061334d10a0a7cdc27d9da9c
BLAKE2b-256 9bb0dd256eacdcc1444296b9cd69a1dc971034319228118993b7d7fe01eae7cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page