Tools using ollama and torch that are easy and nice to have for working with local LLMs
Project description
llmtoolset
llmtoolset is a Python library designed to provide tools for working with large language models (LLMs) and embeddings without relying on APIs like OpenAI or Hugging Face tokens. It focuses on flexibility and performance using Ollama and PyTorch for embeddings. This toolkit streamlines common operations such as sentence encoding, similarity computation, clustering, and utility functions for LLM-driven workflows.
Installation Requirements
Before installing llmtoolset, ensure you have the correct version of PyTorch installed. Use the command below for CUDA-enabled systems to maximize performance. Optionally, include torchvision and torchaudio for broader PyTorch functionality:
pip install torch --index-url https://download.pytorch.org/whl/cu118
If PyTorch isn't pre-installed, llmtoolset will automatically install a default version of PyTorch.
Features
- Sentence Encoding: Encode text into vector representations using SentenceTransformer.
- Cosine Similarity: Calculate similarity between vectors or batches for clustering, ranking, and comparison.
- Embedding Storage: Save and load embeddings seamlessly in
.npyformat. - Nearest Neighbors: Find similar embeddings with customizable thresholds and top-k results.
- Clustering: Group embeddings based on similarity thresholds.
- Tag Extraction: Utility functions to extract and process tags or lists from text.
- Stream Management: Real-time interaction support for LLM streams.
Examples
Encoding Sentences
from llmtoolset.embeddings import SentenceEncoder
encoder = SentenceEncoder()
embeddings = encoder.encode(["This is a test sentence.", "Another sentence to encode."])
print(embeddings) # Outputs a NumPy array of encoded vectors
Calculating Cosine Similarity
from llmtoolset.embeddings import SentenceEncoder, cosine_similarity
# Initialize the encoder
encoder = SentenceEncoder()
# Define the animals
animals = ["cat", "tiger", "fish"]
# Encode the animal names
embeddings = encoder.encode(animals)
# Calculate cosine similarity between the animals
similarity_cat_tiger = cosine_similarity(embeddings[0], embeddings[1])
similarity_cat_fish = cosine_similarity(embeddings[0], embeddings[2])
similarity_tiger_fish = cosine_similarity(embeddings[1], embeddings[2])
print(f"Cosine Similarity between 'cat' and 'tiger': {similarity_cat_tiger}")
print(f"Cosine Similarity between 'cat' and 'fish': {similarity_cat_fish}")
print(f"Cosine Similarity between 'tiger' and 'fish': {similarity_tiger_fish}")
Generating Tags from Text
from llmtoolset import make_tags
text = "This text discusses machine learning and artificial intelligence."
tags = make_tags(text)
print(tags) # Example output: ['machine learning', 'artificial intelligence']
Clustering Similar Embeddings
from llmtoolset.embeddings import SentenceEncoder, group_similar_embeddings
# Initialize the encoder
encoder = SentenceEncoder()
# Define the animals
animals = ["cat", "tiger", "lion", "dog", "wolf", "fish", "shark", "whale"]
# Encode the animal names
embeddings = encoder.encode(animals)
# Group similar embeddings
clusters = group_similar_embeddings(embeddings, similarity_threshold=0.5)
# Visually print the clustering
for cluster in clusters:
print("[Cluster]:")
for item in cluster:
print(f" {animals[item[0]]}")
# Results of this code in testing (tweaking would be needed for perfection)
# [Cluster]:
# cat
# tiger
# lion
# dog
# [Cluster]:
# wolf
# [Cluster]:
# fish
# shark
# whale
Stream Interaction
from llmtoolset import activate_stream_printing, deactivate_stream_printing
# Enable real-time stream printing
activate_stream_printing()
# Disable it when no longer needed
deactivate_stream_printing()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmtoolset-0.4.tar.gz.
File metadata
- Download URL: llmtoolset-0.4.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01d72084902704cd9624d34e3486ec9abf7d90dbe945a27939ac16976e7474ea
|
|
| MD5 |
5f749299bea119ba9d111aa2d26827c0
|
|
| BLAKE2b-256 |
79fac7dfda7d923dac89984db523e8784cb242909ad19fe210969705f19f393c
|
File details
Details for the file llmtoolset-0.4-py3-none-any.whl.
File metadata
- Download URL: llmtoolset-0.4-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a327a7ef66d7a0ffabf54f53479944dca0543c53ecaeaab5d93434fa06f0f832
|
|
| MD5 |
accd862b341f3f373e22d5df8bca6dcd
|
|
| BLAKE2b-256 |
c669f351afb5c2a55e9b0e68b5d6b0a3eda37e9c81f2f55a688a71cdf865594b
|