An api to query local language models using different backends
Project description
Locallm
An api to query local language models using different backends. Supported backends:
- Llama.cpp Python: the local Python bindings for Llama.cpp
- Kobold.cpp: the Koboldcpp api server
- Ollama: the Ollama api server
Quickstart
pip install locallm
Local
from locallm import LocalLm, InferenceParams, LmParams
lm = LocalLm(
LmParams(
models_dir="/home/me/my/models/dir"
)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
"list the planets in the solar system",
InferenceParams(
template=template,
temperature=0.2,
stream=True,
max_tokens=512,
),
)
Koboldcpp
from locallm import KoboldcppLm, LmParams, InferenceParams
lm = KoboldcppLm(
LmParams(is_verbose=True)
)
lm.load_model("", 8192) # sets the context window size to 8196 tokens
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
"list the planets in the solar system",
InferenceParams(
template=template,
stream=True,
max_tokens=512,
),
)
Ollama
from locallm import OllamaLm, LmParams, InferenceParams
lm = Ollama(
LmParams(is_verbose=True)
)
lm.load_model("mistral-7b-instruct-v0.1.Q4_K_M.gguf", 8192)
template = "<s>[INST] {prompt} [/INST]"
lm.infer(
"list the planets in the solar system",
InferenceParams(
stream=True,
template=template,
temperature=0.5,
),
)
Examples
Providers:
- Llama.cpp Python provider
- Kobold.cpp provider
- Ollama provider
Other:
Api
LmProvider
An abstract base class to describe a language model provider. All the providers implement this api
Attributes
- llm
Optional[Llama]
: the language model. - models_dir
str
: the directory where the models are stored. - api_key
str
: the API key for the language model. - server_url
str
: the URL of the language model server. - is_verbose
bool
: whether to print more information. - threads
Optional[int]
: the numbers of threads to use. - gpu_layers
Optional[int]
: the numbers of layers to offload to the GPU. - embedding
Optional[bool]
: use embeddings or not. - on_token
OnTokenType
: the function to be called when a token is generated. Default: outputs the token to the terminal. - on_start_emit
OnStartEmitType
: the function to be called when the model starts emitting tokens.
Example
lm = OllamaLm(LmParams(is_verbose=True))
Methods:
__init__
Constructs all the necessary attributes for the LmProvider object.
Parameters
- params
LmParams
: the parameters for the language model.
Example
lm = KoboldcppLm(LmParams())
load_model
Loads a language model.
Parameters
- model_name
str
: The name of the model to load. - ctx
int
: The context window size for the model. - gpu_layers
Optional[int]
: The number of layers to offload to the GPU for the model.
Example
lm.load_model("my_model.gguf", 2048, 32)
infer
Run an inference query.
Parameters
- prompt
str
: the prompt to generate text from. - params
InferenceParams
: the parameters for the inference query.
Returns
- result
InferenceResult
: the generated text and stats
Example
>>> lm.infer("<s>[INST] List the planets in the solar system [/INST>")
The planets in the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.
Types
InferenceParams
Parameters for inference.
Args
- stream
bool, Optional
: Whether to stream the output. - template
str, Optional
: The template to use for the inference. - threads
int, Optional
: The number of threads to use for the inference. - max_tokens
int, Optional
: The maximum number of tokens to generate. - temperature
float, Optional
: The temperature for the model. - top_p
float, Optional
: The probability cutoff for the top k tokens. - top_k
int, Optional
: The top k tokens to generate. - min_p
float, Optional
: The minimum probability for a token to be considered. - stop
List[str], Optional
: A list of words to stop the model from generating. - frequency_penalty
float, Optional
: The frequency penalty for the model. - presence_penalty
float, Optional
: The presence penalty for the model. - repeat_penalty
float, Optional
: The repeat penalty for the model. - tfs
float, Optional
: The temperature for the model. - grammar
str, Optional
: A gbnf grammar to constraint the model's output
Example
InferenceParams(stream=True, template="<s>[INST] {prompt} [/INST>")
{
"stream": True,
"template": "<s>[INST] {prompt} [/INST>"
}
LmParams
Parameters for language model.
Args
- models_dir
str, Optional
: The directory containing the language model. - api_key
str, Optional
: The API key for the language model. - server_url
str, Optional
: The server URL for the language model. - is_verbose
bool, Optional
: Whether to enable verbose output. - on_token
Callable[[str], None], Optional
: A callback function to be called on each token generated. If not provided the default will output tokens to the command line as they arrive - on_start_emit
Callable[[Optional[Any]], None], Optional
: A callback function to be called on the start of the emission.
Example
LmParams(
models_dir="/home/me/models",
api_key="abc123",
)
Tests
To configure the tests create a tests/localconf.py
containing the some local config info to
run the tests:
# absolute path to your models dir
MODELS_DIR = "/home/me/my/models/dir"
# the model to use in the tests
MODEL = "q5_1-gguf-mamba-gpt-3B_v4.gguf"
# the context window size for the tests
CTX = 2048
Be sure to have the corresponding backend up before running a test.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file locallm-0.5.3.tar.gz
.
File metadata
- Download URL: locallm-0.5.3.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 615fb753ed0c69c662fa8c592b6a163bb98fb775c44a47c2fc320a9320d1b6ca |
|
MD5 | a7e57145339428332ae2e1afbc8d56c8 |
|
BLAKE2b-256 | f6595f54585f386a4b72a375d6bfb23da02329d62f31a23d3ecc6fcc5ff242a3 |
File details
Details for the file locallm-0.5.3-py3-none-any.whl
.
File metadata
- Download URL: locallm-0.5.3-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee8c89c42438f2cd936a7c935cd418d62d135c7e4dcdd61c6a0a137dc29a76c9 |
|
MD5 | 8162f49967c6dbde218e9df9f59a9f9a |
|
BLAKE2b-256 | 201c0cec07e807e70a207f2e6427cd1f132847d4cf1c57518c42ca9fa351a953 |