LLMOps tool designed to simplify the deployment and management of large language model (LLM) applications

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Welcome to Paka

Get your LLM applications to the cloud with ease. Paka handles failure recovery, autoscaling, and monitoring, freeing you to concentrate on crafting your applications.

🚀 Bring LLM models to the cloud in minutes

💰 Cut 50% cost with spot instances, backed by on-demand instances for reliable service quality.

Model	Parameters	Quantization	GPU	On-Demand	Spot	AWS Node (us-west-2)
Llama 3	70B	BF16	A10G x 8	$16.2880	$4.8169	g5.48xlarge
Llama 3	70B	GPTQ 4bit	T4 x 4	$3.9120	$1.6790	g4dn.12xlarge
Llama 3	8B	BF16	L4 x 1	$0.8048	$0.1100	g6.xlarge
Llama 2	7B	GPTQ 4bit	T4 x 1	$0.526	$0.2584	g4dn.xlarge
Mistral	7B	BF16	T4 x 1	$0.526	$0.2584	g4dn.xlarge
Phi3 Mini	3.8B	BF16	T4 x 1	$0.526	$0.2584	g4dn.xlarge

Note: Prices are based on us-west-2 region and are in USD per hour. Spot prices change frequently. See Launch Templates for more details.

🏃 Effortlessly Launch RAG Applications

You only need to take care of the application code. Build the RAG application with your favorite languages (python, TS) and frameworks (Langchain, LlamaIndex) and let Paka handles the rest.

Support for Vector Store

A fast vector store (qdrant) for storing embeddings.
Tunable for performance and cost.

Serverless Deployment

Deploy your application as a serverless container.
Autoscaling and monitoring built-in.

📈 Monitoring

Paka comes with built-in support for monitoring and tracing. Metrics are collected via Prometheus. Users can also enable Prometheus Alertmanager for alerting.

⚙️ Architecture

📜 Roadmap

(Multi-cloud) AWS support
(Backend) vLLM
(Backend) llama.cpp
(Platform) Windows support
(Accelerator) Nvidia GPU support
(Multi-cloud) GCP support
(Backend) TGI
(Accelerator) AMD GPU support
(Accelerator) Inferentia support

🎬 Getting Started

Dependencies

docker daemon and CLI
AWS CLI

# Ensure your AWS credentials are correctly configured.
aws configure

Install Paka

pip install paka

Provisioning the cluster

Create a cluster.yaml file with the following content:

version: "1.2"
aws:
  cluster:
    name: my-awesome-cluster
    region: us-west-2
    namespace: default
    nodeType: t3a.medium
    minNodes: 2
    maxNodes: 4
  prometheus:
    enabled: true
  modelGroups:
    - name: llama2-7b-chat
      nodeType: g4dn.xlarge
      isPublic: true
      minInstances: 1
      maxInstances: 1
      name: llama3-70b-instruct
      runtime:
        image: vllm/vllm-openai:v0.4.2
      model:
        hfRepoId: TheBloke/Llama-2-7B-Chat-GPTQ
        useModelStore: false
      gpu:
        enabled: true
        diskSize: 50

Bring up the cluster with the following command:

paka cluster up -f cluster.yaml

Code up the application

Use your favorite language and framework to build the application. Here is an example of a Python application using Langchain:

invoice_extraction

With Paka, you can effortlessly build your source code and deploy it as a serverless function, no Dockerfile needed. Just ensure the following:

Procfile: Defines the entrypoint for your application. See Procfile.
.cnignore file: Excludes any files that shouldn't be included in the build. See .cnignore.
runtime.txt: Pins the version of the runtime your application uses. See runtime.txt.
requirements.txt or package.json: Lists all necessary packages for your application.

Deploy the App

paka function deploy --name invoice-extraction --source . --entrypoint serve

📖 Documentation

Contributing

code changes
make check-all
Open a PR

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.10

Jun 18, 2024

This version

0.1.9

Jun 1, 2024

0.1.8

May 19, 2024

0.1.7

May 8, 2024

0.1.6

Apr 29, 2024

0.1.5

Apr 25, 2024

0.1.4

Apr 19, 2024

0.1.3

Apr 18, 2024

0.1.2

Apr 10, 2024

0.1.1

Apr 3, 2024

0.1.0

Feb 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paka-0.1.9.tar.gz (70.0 kB view hashes)

Uploaded Jun 1, 2024 Source

Built Distribution

paka-0.1.9-py3-none-any.whl (94.5 kB view hashes)

Uploaded Jun 1, 2024 Python 3

Hashes for paka-0.1.9.tar.gz

Hashes for paka-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`587e39b2f6777f58c3506a2f445e03393f9cdec239d46ca609c5dae966b99c85`
MD5	`294063c55c30c1333d803bb704a67725`
BLAKE2b-256	`1b05a45edd3bc43cea16977b27c2bc259ff1ba15a160d8cc759349945a719e7f`

Hashes for paka-0.1.9-py3-none-any.whl

Hashes for paka-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c477a182b2174480621af4e5d6ad148f21714544d0c05850137b40e7c824b75f`
MD5	`f694928731de219ea79563847ac48014`
BLAKE2b-256	`4f3ffdf48a2c19c857289b046f1e90b9f0be17d89949fced36bc64193612462e`