Skip to main content

Unofficial python bindings for llm-rs. 🐍❤️🦀

Project description

llm-rs-python: Python Bindings for Rust's llm Library

PyPI PyPI - License Downloads

Welcome to llm-rs, an unofficial Python interface for the Rust-based llm library, made possible through PyO3. Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. 🐍❤️🦀

With llm-rs, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU.

For a detailed overview of all the supported architectures, visit the llm project page.

Integrations:

Installation

Simply install it via pip: pip install llm-rs

Installation with GPU Acceleration Support

llm-rs incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the use_gpu parameter of your SessionConfig must be set to True. The llm documentation lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:

MacOS (Using Metal)

For MacOS users, the Metal-supported version of llm-rs can be easily installed via pip:

pip install llm-rs-metal

Windows/Linux (Using CUDA for Nvidia GPUs)

Due to the significant file size, CUDA-supported packages cannot be directly uploaded to pip. To install them, download the appropriate *.whl file from the latest Release and install it using pip as follows:

pip install [wheelname].whl

Windows/Linux (Using OpenCL for All GPUs)

For universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:

pip install llm-rs-opencl

Usage

Running local GGML models:

Models can be loaded via the AutoModel interface.

from llm_rs import AutoModel, KnownModels

#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)

#generate
print(model.generate("The meaning of life is"))

Streaming Text

Text can be yielded from a generator via the stream function:

from llm_rs import AutoModel, KnownModels

#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)

#generate
for token in model.stream("The meaning of life is"):
    print(token)

Running GGML models from the Hugging Face Hub

GGML converted models can be directly downloaded and run from the hub.

from llm_rs import AutoModel

model = AutoModel.from_pretrained("rustformers/mpt-7b-ggml",model_file="mpt-7b-q4_0-ggjt.bin")

If there are multiple models in a repo the model_file has to be specified. If you want to load repositories which were not created throught this library, you have to specify the model_type parameter as the metadata files needed to infer the architecture are missing.

Running Pytorch Transfomer models from the Hugging Face Hub

llm-rs supports automatic conversion of all supported transformer architectures on the Huggingface Hub.

To run covnersions additional dependencies are needed which can be installed via pip install llm-rs[convert].

The models can then be loaded and automatically converted via the from_pretrained function.

from llm_rs import AutoModel

model = AutoModel.from_pretrained("mosaicml/mpt-7b")

Convert Huggingface Hub Models

The following example shows how a Pythia model can be covnverted, quantized and run.

from llm_rs.convert import AutoConverter
from llm_rs import AutoModel, AutoQuantizer
import sys

#define the model which should be converted and an output directory
export_directory = "path/to/directory" 
base_model = "EleutherAI/pythia-410m"

#convert the model
converted_model = AutoConverter.convert(base_model, export_directory)

#quantize the model (this step is optional)
quantized_model = AutoQuantizer.quantize(converted_model)

#load the quantized model
model = AutoModel.load(quantized_model,verbose=True)

#generate text
def callback(text):
    print(text,end="")
    sys.stdout.flush()

model.generate("The meaning of life is",callback=callback)

🦜️🔗 LangChain Usage

Utilizing llm-rs-python through langchain requires additional dependencies. You can install these using pip install llm-rs[langchain]. Once installed, you gain access to the RustformersLLM model through the llm_rs.langchain module. This particular model offers features for text generation and embeddings.

Consider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:

from llm_rs.langchain import RustformersLLM
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template="""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
Answer:"""

prompt = PromptTemplate(input_variables=["instruction"],template=template,)

llm = RustformersLLM(model_path_or_repo_id="rustformers/mpt-7b-ggml",model_file="mpt-7b-instruct-q5_1-ggjt.bin",callbacks=[StreamingStdOutCallbackHandler()])

chain = LLMChain(llm=llm, prompt=prompt)

chain.run("Write a short post congratulating rustformers on their new release of their langchain integration.")

🌾🔱 Haystack Usage

Utilizing llm-rs-python through haystack requires additional dependencies. You can install these using pip install llm-rs[haystack]. Once installed, you gain access to the RustformersInvocationLayer model through the llm_rs.haystack module. This particular model offers features for text generation.

Consider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:

from haystack.nodes import PromptNode, PromptModel
from llm_rs.haystack import RustformersInvocationLayer

model = PromptModel("rustformers/open-llama-ggml",
                    max_length=1024,
                    invocation_layer_class=RustformersInvocationLayer,
                    model_kwargs={"model_file":"open_llama_3b-q5_1-ggjt.bin"})

pn = PromptNode(
    model,
    max_length=1024
)

pn("Write me a short story about a lama riding a crab.",stream=True)

Documentation

For in-depth information on customizing the loading and generation processes, refer to our detailed documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_rs-0.2.15.tar.gz (57.0 kB view details)

Uploaded Source

Built Distributions

llm_rs-0.2.15-cp38-abi3-win_amd64.whl (3.6 MB view details)

Uploaded CPython 3.8+ Windows x86-64

llm_rs-0.2.15-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

llm_rs-0.2.15-cp38-abi3-macosx_11_0_arm64.whl (4.0 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

llm_rs-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8+ macOS 10.9+ x86-64

File details

Details for the file llm_rs-0.2.15.tar.gz.

File metadata

  • Download URL: llm_rs-0.2.15.tar.gz
  • Upload date:
  • Size: 57.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.2.3

File hashes

Hashes for llm_rs-0.2.15.tar.gz
Algorithm Hash digest
SHA256 e603463f4aabc043da02037b6d9a7c950b76618c57281d3c606e71e259b12c84
MD5 905e3c0d2577395c39d8229bc7935ead
BLAKE2b-256 126c12765b454d51c0abaace8cb3eae59a0693df51e9c90315b92ac637ba63cd

See more details on using hashes here.

File details

Details for the file llm_rs-0.2.15-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for llm_rs-0.2.15-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8a7c114b90c831726f5536208182be0bc4e2768f21b857b67503b88b5523bdd3
MD5 4fee18872478d22af75e337effd17fd7
BLAKE2b-256 972cc4eb52a16dbecda998b241aff24130b56af4b2d0182b0fa86a0f92023a2a

See more details on using hashes here.

File details

Details for the file llm_rs-0.2.15-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for llm_rs-0.2.15-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a0670bc94a03e3da15f470f4fce9e8b42ff02ffa4f0632e7dd02640599ecd059
MD5 58f6fd8cd70583ea0e35d0e7cd56181f
BLAKE2b-256 43c91d62e054985f786c608575a0b4a5e9716adebe97ab296c9e00bdea79f2b8

See more details on using hashes here.

File details

Details for the file llm_rs-0.2.15-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for llm_rs-0.2.15-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7c2251a21c1643f838e774e5aae6e23b0bd7809effc56770b687fee35f88a1e4
MD5 d8170cfe6fd1221182a49534afd5c548
BLAKE2b-256 fa13b838d2e81268f468959735fa718d4e61e473a6c4e88e14f1a0e52bc5ee5b

See more details on using hashes here.

File details

Details for the file llm_rs-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for llm_rs-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 dfc340fc665ad4970cc4a93a09baf4a54a23776d7674a4e96ed6013212d6a7ed
MD5 da39cbc9e19926c5f750019da5297553
BLAKE2b-256 75d35d797f269de62b7cdf0890dcc5d9ba3d42059aa2cfc1b9d7478df50f202a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page