Unofficial python bindings for llm-rs. 🐍❤️🦀
Project description
llm-rs-python: Python Bindings for Rust's llm Library
Welcome to llm-rs
, an unofficial Python interface for the Rust-based llm library, made possible through PyO3. Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. 🐍❤️🦀
With llm-rs
, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU.
For a detailed overview of all the supported architectures, visit the llm project page.
Integrations:
Installation
Simply install it via pip: pip install llm-rs
Installation with GPU Acceleration Support
llm-rs
incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the use_gpu
parameter of your SessionConfig
must be set to True
. The llm documentation lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:
MacOS (Using Metal)
For MacOS users, the Metal-supported version of llm-rs
can be easily installed via pip:
pip install llm-rs-metal
Windows/Linux (Using CUDA for Nvidia GPUs)
Due to the significant file size, CUDA-supported packages cannot be directly uploaded to pip
. To install them, download the appropriate *.whl
file from the latest Release and install it using pip as follows:
pip install [wheelname].whl
Windows/Linux (Using OpenCL for All GPUs)
For universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:
pip install llm-rs-opencl
Usage
Running local GGML models:
Models can be loaded via the AutoModel
interface.
from llm_rs import AutoModel, KnownModels
#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)
#generate
print(model.generate("The meaning of life is"))
Streaming Text
Text can be yielded from a generator via the stream
function:
from llm_rs import AutoModel, KnownModels
#load the model
model = AutoModel.from_pretrained("path/to/model.bin",model_type=KnownModels.Llama)
#generate
for token in model.stream("The meaning of life is"):
print(token)
Running GGML models from the Hugging Face Hub
GGML converted models can be directly downloaded and run from the hub.
from llm_rs import AutoModel
model = AutoModel.from_pretrained("rustformers/mpt-7b-ggml",model_file="mpt-7b-q4_0-ggjt.bin")
If there are multiple models in a repo the model_file
has to be specified.
If you want to load repositories which were not created throught this library, you have to specify the model_type
parameter as the metadata files needed to infer the architecture are missing.
Running Pytorch Transfomer models from the Hugging Face Hub
llm-rs
supports automatic conversion of all supported transformer architectures on the Huggingface Hub.
To run covnersions additional dependencies are needed which can be installed via pip install llm-rs[convert]
.
The models can then be loaded and automatically converted via the from_pretrained
function.
from llm_rs import AutoModel
model = AutoModel.from_pretrained("mosaicml/mpt-7b")
Convert Huggingface Hub Models
The following example shows how a Pythia model can be covnverted, quantized and run.
from llm_rs.convert import AutoConverter
from llm_rs import AutoModel, AutoQuantizer
import sys
#define the model which should be converted and an output directory
export_directory = "path/to/directory"
base_model = "EleutherAI/pythia-410m"
#convert the model
converted_model = AutoConverter.convert(base_model, export_directory)
#quantize the model (this step is optional)
quantized_model = AutoQuantizer.quantize(converted_model)
#load the quantized model
model = AutoModel.load(quantized_model,verbose=True)
#generate text
def callback(text):
print(text,end="")
sys.stdout.flush()
model.generate("The meaning of life is",callback=callback)
🦜️🔗 LangChain Usage
Utilizing llm-rs-python
through langchain requires additional dependencies. You can install these using pip install llm-rs[langchain]
. Once installed, you gain access to the RustformersLLM
model through the llm_rs.langchain
module. This particular model offers features for text generation and embeddings.
Consider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:
from llm_rs.langchain import RustformersLLM
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
template="""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
Answer:"""
prompt = PromptTemplate(input_variables=["instruction"],template=template,)
llm = RustformersLLM(model_path_or_repo_id="rustformers/mpt-7b-ggml",model_file="mpt-7b-instruct-q5_1-ggjt.bin",callbacks=[StreamingStdOutCallbackHandler()])
chain = LLMChain(llm=llm, prompt=prompt)
chain.run("Write a short post congratulating rustformers on their new release of their langchain integration.")
🌾🔱 Haystack Usage
Utilizing llm-rs-python
through haystack requires additional dependencies. You can install these using pip install llm-rs[haystack]
. Once installed, you gain access to the RustformersInvocationLayer
model through the llm_rs.haystack
module. This particular model offers features for text generation.
Consider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:
from haystack.nodes import PromptNode, PromptModel
from llm_rs.haystack import RustformersInvocationLayer
model = PromptModel("rustformers/open-llama-ggml",
max_length=1024,
invocation_layer_class=RustformersInvocationLayer,
model_kwargs={"model_file":"open_llama_3b-q5_1-ggjt.bin"})
pn = PromptNode(
model,
max_length=1024
)
pn("Write me a short story about a lama riding a crab.",stream=True)
Documentation
For in-depth information on customizing the loading and generation processes, refer to our detailed documentation.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file llm_rs_metal-0.2.15.tar.gz
.
File metadata
- Download URL: llm_rs_metal-0.2.15.tar.gz
- Upload date:
- Size: 57.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e6b8fa79e09a47a1abe37faf5851cd6f5e104e4a157193241945db16153e537 |
|
MD5 | 84e4ff99c53d03d1a572ddc79d5c4967 |
|
BLAKE2b-256 | 57fe65726417c1598cd40e7a81444973337bf4b73fedbee69c3b8806c73b23b6 |
File details
Details for the file llm_rs_metal-0.2.15-cp38-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: llm_rs_metal-0.2.15-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.0 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1971614f9b0594e75c0cb3b43d4589f93de17cdc646149f2948364492f36074e |
|
MD5 | ff422708b434ae9e9149a008d544c84a |
|
BLAKE2b-256 | c7cd70938feeeb557ac8b8e7338a07a0f2b612f676a48ac64e05eefd846278b1 |
File details
Details for the file llm_rs_metal-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: llm_rs_metal-0.2.15-cp38-abi3-macosx_10_9_x86_64.whl
- Upload date:
- Size: 4.2 MB
- Tags: CPython 3.8+, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9e860fa7fa12c50df257ec31065146753a321a51e759cca4c704772fc456a77 |
|
MD5 | 60766bbdc3ca3c4d4e73b294ac320520 |
|
BLAKE2b-256 | c49a8bf191c112df93a2357a040453e09f4d7e1e982bd3485adb732c00420fa3 |