Skip to main content

An api to query local language models using different backends

Project description

Locallm

pub package

An api to query local language models using different backends. Supported backends:

Quickstart

pip install locallm

Local

from locallm import LocalLm, InferenceParams, LmParams

lm = LocalLm(
    LmParams(
        models_dir="/home/me/my/models/dir"
    )
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        temperature=0.2,
        stream=True,
        max_tokens=512,
    ),
)

Koboldcpp

from locallm import KoboldcppLm, LmParams, InferenceParams

lm = KoboldcppLm(
    LmParams(is_verbose=True)
)
lm.load_model("", 8192) # sets the context window size to 8196 tokens
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        template=template,
        stream=True,
        max_tokens=512,
    ),
)

Ollama

from locallm import OllamaLm, LmParams, InferenceParams

lm = Ollama(
    LmParams(is_verbose=True)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
    "list the planets in the solar system",
    InferenceParams(
        stream=True,
        template=template,
        temperature=0.5,
    ),
)

Examples

Providers:

Other:

  • Cli: a Python terminal client
  • Autodoc: generate docstrings from code

Api

LmProvider

An abstract base class to describe a language model provider. All the providers implement this api

Attributes

  • llm Optional[Llama]: the language model.
  • models_dir str: the directory where the models are stored.
  • api_key str: the API key for the language model.
  • server_url str: the URL of the language model server.
  • is_verbose bool: whether to print more information.
  • threads Optional[int]: the numbers of threads to use.
  • gpu_layers Optional[int]: the numbers of layers to offload to the GPU.
  • embedding Optional[bool]: use embeddings or not.
  • on_token OnTokenType: the function to be called when a token is generated. Default: outputs the token to the terminal.
  • on_start_emit OnStartEmitType: the function to be called when the model starts emitting tokens.

Example

lm = OllamaLm(LmParams(is_verbose=True))

Methods:

__init__

Constructs all the necessary attributes for the LmProvider object.

Parameters

  • params LmParams: the parameters for the language model.

Example

lm = KoboldcppLm(LmParams())

load_model

Loads a language model.

Parameters

  • model_name str: The name of the model to load.
  • ctx int: The context window size for the model.
  • gpu_layers Optional[int]: The number of layers to offload to the GPU for the model.

Example

lm.load_model("my_model.gguf", 2048, 32)

infer

Run an inference query.

Parameters

  • prompt str: the prompt to generate text from.
  • params InferenceParams: the parameters for the inference query.

Returns

  • result InferenceResult: the generated text and stats

Example

>>> lm.infer("<s>[INST] List the planets in the solar system [/INST>")
The planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.

Types

InferenceParams

Parameters for inference.

Args

  • stream bool, Optional: Whether to stream the output.
  • template str, Optional: The template to use for the inference.
  • threads int, Optional: The number of threads to use for the inference.
  • max_tokens int, Optional: The maximum number of tokens to generate.
  • temperature float, Optional: The temperature for the model.
  • top_p float, Optional: The probability cutoff for the top k tokens.
  • top_k int, Optional: The top k tokens to generate.
  • min_p float, Optional: The minimum probability for a token to be considered.
  • stop List[str], Optional: A list of words to stop the model from generating.
  • frequency_penalty float, Optional: The frequency penalty for the model.
  • presence_penalty float, Optional: The presence penalty for the model.
  • repeat_penalty float, Optional: The repeat penalty for the model.
  • tfs float, Optional: The temperature for the model.
  • grammar str, Optional: A gbnf grammar to constraint the model's output

Example

InferenceParams(stream=True, template="<s>[INST] {prompt} [/INST>")
{
    "stream": True,
    "template": "<s>[INST] {prompt} [/INST>"
}

LmParams

Parameters for language model.

Args

  • models_dir str, Optional: The directory containing the language model.
  • api_key str, Optional: The API key for the language model.
  • server_url str, Optional: The server URL for the language model.
  • is_verbose bool, Optional: Whether to enable verbose output.
  • on_token Callable[[str], None], Optional: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive
  • on_start_emit Callable[[Optional[Any]], None], Optional: A callback function to be called on the start of the emission.

Example

LmParams(
    models_dir="/home/me/models",
    api_key="abc123",
)

Tests

To configure the tests create a tests/localconf.py containing the some local config info to run the tests:

# absolute path to your models dir
MODELS_DIR = "/home/me/my/models/dir"
# the model to use in the tests
MODEL = "q5_1-gguf-mamba-gpt-3B_v4.gguf"
# the context window size for the tests
CTX = 2048

Be sure to have the corresponding backend up before running a test.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locallm-0.5.3.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

locallm-0.5.3-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file locallm-0.5.3.tar.gz.

File metadata

  • Download URL: locallm-0.5.3.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for locallm-0.5.3.tar.gz
Algorithm Hash digest
SHA256 615fb753ed0c69c662fa8c592b6a163bb98fb775c44a47c2fc320a9320d1b6ca
MD5 a7e57145339428332ae2e1afbc8d56c8
BLAKE2b-256 f6595f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3

See more details on using hashes here.

File details

Details for the file locallm-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: locallm-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for locallm-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ee8c89c42438f2cd936a7c935cd418d62d135c7e4dcdd61c6a0a137dc29a76c9
MD5 8162f49967c6dbde218e9df9f59a9f9a
BLAKE2b-256 201c0cec07e807e70a207f2e6427cd1f132847d4cf1c57518c42ca9fa351a953

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page