Skip to main content

Run models distributed as GGUF files

Project description

llm-gguf

PyPI Changelog Tests License

Run models distributed as GGUF files using LLM

Installation

Install this plugin in the same environment as LLM:

llm install llm-gguf

Usage

This plugin runs models that have been distributed as GGUF files.

You can either ask the plugin to download these directly, or you can register models you have already downloaded.

To download the LM Studio GGUF of Llama 3.1 8B Instruct, run the following command:

llm gguf download-model \
  https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --alias llama-3.1-8b-instruct --alias l31i

The --alias options set aliases for that model, you can omit them if you don't want to set any.

This command will download the 4.92GB file and store it in the directory revealed by running llm gguf models-dir - on macOS this will be ~/Library/Application Support/io.datasette.llm/gguf/models.

Run llm models to confirm that the model has been installed.

You can then run prompts through that model like this:

llm -m gguf/Meta-Llama-3.1-8B-Instruct-Q4_K_M 'Five great names for a pet lemur'

Or using one of the aliases that you set like this:

llm -m l31i 'Five great names for a pet lemur'

You can start a persistent chat session with the model using llm chat - this will avoid having to load the model into memory for each prompt:

llm chat -m l31i
Chatting with gguf/Meta-Llama-3.1-8B-Instruct-Q4_K_M
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> tell me a joke about a walrus, a pelican and a lemur getting lunch
Here's one: Why did the walrus, the pelican, and the lemur go to the cafeteria for lunch? ...

If you have downloaded the model already you can register it with the plugin while keeping the file in its current location like this:

llm gguf register-model \
  ~/Downloads/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --alias llama-3.1-8b-instruct --alias l31i

This plugin currently only works with chat models - these are usually distributed in files with the prefix -Instruct or -Chat or similar.

For non-chat models you may have better luck with the older llm-llama-cpp plugin.

Embedding models

This plugin also supports embedding models that are distributed as GGUFs.

These are managed using the llm gguf embed-models, llm gguf download-embed-model and llm gguf register-embed-model commands.

For example, to start using the excellent and tiny mxbai-embed-xsmall-v1 model you can download the 30.8MB GGUF version like this:

llm gguf download-embed-model \
  https://huggingface.co/mixedbread-ai/mxbai-embed-xsmall-v1/resolve/main/gguf/mxbai-embed-xsmall-v1-q8_0.gguf

This will store the model in the directory shown if you run llm gguf models-dir.

Confirm that the new model is available by running this:

llm embed-models

You should see gguf/mxbai-embed-xsmall-v1-q8_0 in the list.

Then try that model out like this:

llm embed -m gguf/mxbai-embed-xsmall-v1-q8_0 -c 'hello'

This will output a 384 element floating point JSON array.

Consult the LLM documentation for more information on how to use these embeddings.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-gguf
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

llm install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_gguf-0.2.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

llm_gguf-0.2-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_gguf-0.2.tar.gz.

File metadata

  • Download URL: llm_gguf-0.2.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llm_gguf-0.2.tar.gz
Algorithm Hash digest
SHA256 c8fcb07e32bc16c6f597fcd760c1baea88a555ac7b43b3e6651be18067ee27d6
MD5 c76b1aadb9c81cdc0021ffe2963b5cad
BLAKE2b-256 551babe762bc8c2cfd649690901da2c50aa6a2761ca51ac1bd91b6265170f7a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_gguf-0.2.tar.gz:

Publisher: publish.yml on simonw/llm-gguf

Attestations:

File details

Details for the file llm_gguf-0.2-py3-none-any.whl.

File metadata

  • Download URL: llm_gguf-0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llm_gguf-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9f7b8e164a90c67e0767ce76ef7e66078576ba34d204f68480c93f2c90f42daa
MD5 9e571628f148cd3151098c59f3f78386
BLAKE2b-256 7bdcf9db07a0ab215d5a0781b24dd7e9686d8235f7337a67b5fa81e4febc14b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_gguf-0.2-py3-none-any.whl:

Publisher: publish.yml on simonw/llm-gguf

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page