Skip to main content

A simple and efficient python library for fast inference of GGUF Large Language Models.

Project description

ALLM

ALLM is a Python library designed for fast inference of GGUF (Generic Global Unsupervised Features) Large Language Models (LLMs) on both CPU and GPU. It provides a convenient interface for loading pre-trained GGUF models and performing inference using them. This library is ideal for applications where quick response times are crucial, such as chatbots, text generation, and more.

Features

  • Efficient Inference: ALLM leverages the power of LLM models to provide fast and accurate inference.
  • CPU and GPU Support: The library is optimized for both CPU and GPU, allowing you to choose the best hardware for your application.
  • Simple Interface: With a straightforward command line support, you can easily load models and perform inference with just a single command.
  • Flexible Configuration: Customize inference settings such as temperature and model path to suit your needs.
  • Automated Hosting Configuration: Models are swiftly downloaded and configured in your environment, enabling them to be operational within minutes.

Operating System Compatibility

This table outlines the compatibility of different operating systems with their respective providers:

OS Type

Windows

Linux

Linux

Mac

MacOS

Status Supported Supported Coming Soon
Dependencies Python, VS Tools Python,

bash sudo apt-get install build-essential -y

Coming Soon
Models Support - Local Models - VertixAI - Azure OpenAI - Local Models - VertixAI - Azure OpenAI Coming Soon

|

Supported Models

LLM Family Hosting Supported LLMs
ALLM Self Hosted (gguf) Mistral, Mistral_instruct, Llama2, full list availble in supported model section.
Azure AzureOpen AI gpt-35-turbo, gpt-4, gpt-4-turbo or any.
Google LLMs VertexAI deployment gemini-pro, text-bison@001 or any.
Llama2 Azure deployment llama2-7b, llama2-13b, llama2-70b
Mistral Azure deployment Mistral-7b, Mixtral-7bx8

Installation

You can install ALLM using pip:

pip install allm

Usage

0.1 LocalModel Generic Prompt

You can start inference with a simple 'allm-run' command. The command takes name or path, temperature(optional), max new tokens(optional) and additional model kwargs(optional) as arguments.

when you run the allm-run, Defualt Mistral model will be downloaded to your systems and gets configured automatically, if the model name is not provided.

allm-run --name model_name_or_path

API

0.2 LocalModel Generic API

You can initiate the inference API by simply using the 'allm-serve' command. This command launches the API server on the default host, 127.0.0.1:5000. If you prefer to run the API server on a different port and host, you have the option to customize the apiconfig.txt file within your model directory.

allm-serve

ALLM Agents

1.1 New Agent Creation

To create local agent, begin by loading your knowledge documents into the database using the allm-newagent command and specifying the agent name:

allm-newagent --doc "document_path" --agent agent_name

or

allm-newagent --dir "directory containing files to be ingested" --agent agent_name

1.2 Agent Chat

After agent is created successfully with your knowledge document, you can start the local agent chat with the allm-agentchat command:

allm-agentchat --agent agent name

After your agents are created you can also initiate agent-specific API server using the allm-agentapi command:

1.3 Agent API

allm-agentapi --agent agent name

After your agents are created you can also update the knowledge on the existing agent by adding documents using allm-updateagent command:

allm-updateagent --agent agent name

Supported Cloud models.

ALLM supports all types of Generative LLMs on AzureOpenAI & VertexAI, including GPT(s) & Geminipro models. You can start local inference of cloud based models using the following command:

2.1 VertixAI Generic Prompt

allm-run-vertex --projectid Id_of_your_GCP_project --region location_of_your_cloud_server

or

allm-run-vertex

2.2 AzureOpneAI Generic Prompt

allm-run-azure --key key --version version --endpoint https://{your_endpoint}.openai.azure.com --model model_name

or

allm-run-azure

You can also have a custom agent working with your cloud deployed model using the following command. It is important to note that before this step, agent should be created using section: 1.1 New Agent Creation.

2.3 VertixAI AgentChat

allm-agentchat-vertex --projectid Id_of_your_GCP_project --region location_of_your_cloud_server --agent agent_name

model\vertex-config.json needs to be configured to use below command, this will ensure projectid, region are captured.

allm-agentchat-vertex --agent agent_name

2.4 AzureOpenAI AgentChat

allm-agentchat-azure --key key --version version --endpoint https://{your_endpoint}.openai.azure.com --model model_name --agent agentname

model\azure-config.json needs to be configured to use below command, this will ensure endpoint, modelname etc are captured.

allm-agentchat-azure --agent agentname

model_name is an optional parameter in both vertex and azure, if not mentioned, inference will work on gemini-1.0-pro-002 for vertex and gpt-35-turbo for OpenAI by default.

2.5 AzureOpenAI AgentChat API

allm-agentapi-azure --agent agentname

2.6 AzureOpenAI AgentChat API

allm-agentapi-vertex --agent agent_name

Supported Model names.

  • Llama3
  • Llama2
  • Llama
  • Llama2_chat
  • Llama_chat
  • Mistral
  • Mistral_instruct

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ALLM-1.0.4-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file ALLM-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: ALLM-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.1

File hashes

Hashes for ALLM-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eccf9e3504b828a7738dbceae6d0507be1fe5b230fd74206d1a9799a6ed31c3b
MD5 a10ffb5f6d8a29cf84b33b62ead5656b
BLAKE2b-256 e675a51a0015a0d486d2650588a5ec0822b42eebad362de6c6410efe8ef80f84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page