Skip to main content

Serve llama models locally

Project description

Llama serve

[!WARNING] This package is now deprecated. Use londonaicentre-mesa-local rather than londonaicentre-llama-serve for pip commands.

Serve llama models locally.

  • ⬇️ Downloads weights from S3

  • 📦 Unpacks

  • 🚀 Serves via a local OpenAI-compatible server

Prerequisites

Software

  • Python 3.12

Hardware

  • A GPU with >=24GB VRAM (tested on NVIDIA A30)

Configuration

  • Create a file called .env in the directory where you intend to run this package. Populate it with the details you have been provided with in the following format:
MODEL_NAME=
WEIGHTS_ID=
WEIGHTS_KEY=

Installation

  1. (Recommended) Create a virtual environment and activate it:

    python -m venv .venv
    source .venv/bin/activate
    
  2. Install this package: pip install londonaicentre-llama-serve.

Usage

CLI

  1. Note command line arguments:

    Argument Description
    -v, --verbose Enable debug output (optional)
  2. Start the server as follows: llamaserve [args].

Clients

OpenAI (example)

  1. Interact with the server using the OpenAI client in python:

    from openai import OpenAI
    
    client = OpenAI(
        base_url="http://localhost:5000/v1",
        api_key="blank" 
    )
    
    response = client.chat.completions.create(
        model="<MODEL_NAME>",
        messages=[
            {"role": "system", "content": "You are an LLM named gpt-4o"},
            {"role": "user", "content": "Hello"}
        ]
    )
    
    print(response.choices[0].message.content)
    

License

This project uses a proprietary license (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

londonaicentre_llama_serve-1.2.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

londonaicentre_llama_serve-1.2.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file londonaicentre_llama_serve-1.2.0.tar.gz.

File metadata

  • Download URL: londonaicentre_llama_serve-1.2.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for londonaicentre_llama_serve-1.2.0.tar.gz
Algorithm Hash digest
SHA256 4511e2b646f1ddccf55bdd66005ad55b68dcd6ffbbafdf8878eddfa571c08665
MD5 383b14a94917aeae4f9a20d2fd0ec9f7
BLAKE2b-256 9029b732b6c820b8d873dabe5d9d54fa96e17601c4cb0d560f6e45b9468674d8

See more details on using hashes here.

File details

Details for the file londonaicentre_llama_serve-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: londonaicentre_llama_serve-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for londonaicentre_llama_serve-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 546187f60b35f016e9e6232c19451aafac74f59ae1b34be0dd5b6807a6d6e494
MD5 5e3509d01539028e1f4a372069ef4a4d
BLAKE2b-256 663a85553467575e2ae37e55201c01df951cbcea8e4307fc6366a91f438c7078

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page