Unofficial python bindings for llm-rs. 🐍❤️🦀
Project description
llm-rs-python: Python Bindings for Rust's llm Library
Welcome to llm-rs
, an unofficial Python interface for the Rust-based llm library, made possible through PyO3. Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. 🐍❤️🦀
With llm-rs
, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU.
For a detailed overview of all the supported architectures, visit the llm project page.
Integrations:
Installation
Simply install it via pip: pip install llm-rs
Installation with GPU Acceleration Support
llm-rs
incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the use_gpu
parameter of your SessionConfig
must be set to True
. The llm documentation lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:
MacOS (Using Metal)
For MacOS users, the Metal-supported version of llm-rs
can be easily installed via pip:
pip install llm-rs-metal
Windows/Linux (Using CUDA for Nvidia GPUs)
Due to the significant file size, CUDA-supported packages cannot be directly uploaded to pip
. To install them, download the appropriate *.whl
file from the latest Release and install it using pip as follows:
pip install [wheelname].whl
Windows/Linux (Using OpenCL for All GPUs)
For universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:
pip install llm-rs-opencl
Usage
Running local GGML models:
Models can be loaded via the AutoModel
interface.
from llm_rs import AutoModel, KnownModels
#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)
#generate
print(model.generate("The meaning of life is"))
Streaming Text
Text can be yielded from a generator via the stream
function:
from llm_rs import AutoModel, KnownModels
#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)
#generate
for token in model.stream("The meaning of life is"):
print(token)
Running GGML models from the Hugging Face Hub
GGML converted models can be directly downloaded and run from the hub.
from llm_rs import AutoModel
model = AutoModel.from_pretrained("rustformers/mpt-7b-ggml",model_file="mpt-7b-q4_0-ggjt.bin")
If there are multiple models in a repo the model_file
has to be specified.
If you want to load repositories which were not created throught this library, you have to specify the model_type
parameter as the metadata files needed to infer the architecture are missing.
Running Pytorch Transfomer models from the Hugging Face Hub
llm-rs
supports automatic conversion of all supported transformer architectures on the Huggingface Hub.
To run covnersions additional dependencies are needed which can be installed via pip install llm-rs[convert]
.
The models can then be loaded and automatically converted via the from_pretrained
function.
from llm_rs import AutoModel
model = AutoModel.from_pretrained("mosaicml/mpt-7b")
Convert Huggingface Hub Models
The following example shows how a Pythia model can be covnverted, quantized and run.
from llm_rs.convert import AutoConverter
from llm_rs import AutoModel, AutoQuantizer
import sys
#define the model which should be converted and an output directory
export_directory = "path/to/directory"
base_model = "EleutherAI/pythia-410m"
#convert the model
converted_model = AutoConverter.convert(base_model, export_directory)
#quantize the model (this step is optional)
quantized_model = AutoQuantizer.quantize(converted_model)
#load the quantized model
model = AutoModel.load(quantized_model,verbose=True)
#generate text
def callback(text):
print(text,end="")
sys.stdout.flush()
model.generate("The meaning of life is",callback=callback)
🦜️🔗 LangChain Usage
Utilizing llm-rs-python
through langchain requires additional dependencies. You can install these using pip install llm-rs[langchain]
. Once installed, you gain access to the RustformersLLM
model through the llm_rs.langchain
module. This particular model offers features for text generation and embeddings.
Consider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:
from llm_rs.langchain import RustformersLLM
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
template="""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
Answer:"""
prompt = PromptTemplate(input_variables=["instruction"],template=template,)
llm = RustformersLLM(model_path_or_repo_id="rustformers/mpt-7b-ggml",model_file="mpt-7b-instruct-q5_1-ggjt.bin",callbacks=[StreamingStdOutCallbackHandler()])
chain = LLMChain(llm=llm, prompt=prompt)
chain.run("Write a short post congratulating rustformers on their new release of their langchain integration.")
🌾🔱 Haystack Usage
Utilizing llm-rs-python
through haystack requires additional dependencies. You can install these using pip install llm-rs[haystack]
. Once installed, you gain access to the RustformersInvocationLayer
model through the llm_rs.haystack
module. This particular model offers features for text generation.
Consider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:
from haystack.nodes import PromptNode, PromptModel
from llm_rs.haystack import RustformersInvocationLayer
model = PromptModel("rustformers/open-llama-ggml",
max_length=1024,
invocation_layer_class=RustformersInvocationLayer,
model_kwargs={"model_file":"open_llama_3b-q5_1-ggjt.bin"})
pn = PromptNode(
model,
max_length=1024
)
pn("Write me a short story about a lama riding a crab.",stream=True)
Documentation
For in-depth information on customizing the loading and generation processes, refer to our detailed documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for llm_rs-0.2.15-cp38-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a7c114b90c831726f5536208182be0bc4e2768f21b857b67503b88b5523bdd3 |
|
MD5 | 4fee18872478d22af75e337effd17fd7 |
|
BLAKE2b-256 | 972cc4eb52a16dbecda998b241aff24130b56af4b2d0182b0fa86a0f92023a2a |
Hashes for llm_rs-0.2.15-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0670bc94a03e3da15f470f4fce9e8b42ff02ffa4f0632e7dd02640599ecd059 |
|
MD5 | 58f6fd8cd70583ea0e35d0e7cd56181f |
|
BLAKE2b-256 | 43c91d62e054985f786c608575a0b4a5e9716adebe97ab296c9e00bdea79f2b8 |
Hashes for llm_rs-0.2.15-cp38-abi3-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c2251a21c1643f838e774e5aae6e23b0bd7809effc56770b687fee35f88a1e4 |
|
MD5 | d8170cfe6fd1221182a49534afd5c548 |
|
BLAKE2b-256 | fa13b838d2e81268f468959735fa718d4e61e473a6c4e88e14f1a0e52bc5ee5b |
Hashes for llm_rs-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfc340fc665ad4970cc4a93a09baf4a54a23776d7674a4e96ed6013212d6a7ed |
|
MD5 | da39cbc9e19926c5f750019da5297553 |
|
BLAKE2b-256 | 75d35d797f269de62b7cdf0890dcc5d9ba3d42059aa2cfc1b9d7478df50f202a |