Skip to main content

Tools using ollama and torch that are easy and nice to have for working with local LLMs

Project description

llmtoolset

PyPI version

llmtoolset is a Python library designed to provide tools for working with large language models (LLMs) and embeddings without relying on APIs like OpenAI or Hugging Face tokens. It focuses on flexibility and performance using Ollama and PyTorch for embeddings. This toolkit streamlines common operations such as sentence encoding, similarity computation, clustering, and utility functions for LLM-driven workflows.

Installation Requirements

Before installing llmtoolset, ensure you have the correct version of PyTorch installed. Use the command below for CUDA-enabled systems to maximize performance. Optionally, include torchvision and torchaudio for broader PyTorch functionality:

pip install torch --index-url https://download.pytorch.org/whl/cu118

If PyTorch isn't pre-installed, llmtoolset will automatically install a default version of PyTorch.

Features

  • Sentence Encoding: Encode text into vector representations using SentenceTransformer.
  • Cosine Similarity: Calculate similarity between vectors or batches for clustering, ranking, and comparison.
  • Embedding Storage: Save and load embeddings seamlessly in .npy format.
  • Nearest Neighbors: Find similar embeddings with customizable thresholds and top-k results.
  • Clustering: Group embeddings based on similarity thresholds.
  • Tag Extraction: Utility functions to extract and process tags or lists from text.
  • Stream Management: Real-time interaction support for LLM streams.

Examples

Encoding Sentences

from llmtoolset.embeddings import SentenceEncoder

encoder = SentenceEncoder()
embeddings = encoder.encode(["This is a test sentence.", "Another sentence to encode."])
print(embeddings)  # Outputs a NumPy array of encoded vectors

Calculating Cosine Similarity

from llmtoolset.embeddings import SentenceEncoder, cosine_similarity

# Initialize the encoder
encoder = SentenceEncoder()

# Define the animals
animals = ["cat", "tiger", "fish"]

# Encode the animal names
embeddings = encoder.encode(animals)

# Calculate cosine similarity between the animals
similarity_cat_tiger = cosine_similarity(embeddings[0], embeddings[1])
similarity_cat_fish = cosine_similarity(embeddings[0], embeddings[2])
similarity_tiger_fish = cosine_similarity(embeddings[1], embeddings[2])

print(f"Cosine Similarity between 'cat' and 'tiger': {similarity_cat_tiger}")
print(f"Cosine Similarity between 'cat' and 'fish': {similarity_cat_fish}")
print(f"Cosine Similarity between 'tiger' and 'fish': {similarity_tiger_fish}")

Generating Tags from Text

from llmtoolset import make_tags

text = "This text discusses machine learning and artificial intelligence."
tags = make_tags(text)
print(tags)  # Example output: ['machine learning', 'artificial intelligence']

Clustering Similar Embeddings

from llmtoolset.embeddings import SentenceEncoder, group_similar_embeddings

# Initialize the encoder
encoder = SentenceEncoder()

# Define the animals
animals = ["cat", "tiger", "lion", "dog", "wolf", "fish", "shark", "whale"]

# Encode the animal names
embeddings = encoder.encode(animals)

# Group similar embeddings
clusters = group_similar_embeddings(embeddings, similarity_threshold=0.5)

# Visually print the clustering
for cluster in clusters:
    print("[Cluster]:")
    for item in cluster:
        print(f"   {animals[item[0]]}")

# Results of this code in testing (tweaking would be needed for perfection)
# [Cluster]:
#    cat
#    tiger
#    lion
#    dog
# [Cluster]:
#    wolf
# [Cluster]:
#    fish
#    shark
#    whale

Stream Interaction

from llmtoolset import activate_stream_printing, deactivate_stream_printing

# Enable real-time stream printing
activate_stream_printing()

# Disable it when no longer needed
deactivate_stream_printing()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmtoolset-0.4.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmtoolset-0.4-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file llmtoolset-0.4.tar.gz.

File metadata

  • Download URL: llmtoolset-0.4.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llmtoolset-0.4.tar.gz
Algorithm Hash digest
SHA256 01d72084902704cd9624d34e3486ec9abf7d90dbe945a27939ac16976e7474ea
MD5 5f749299bea119ba9d111aa2d26827c0
BLAKE2b-256 79fac7dfda7d923dac89984db523e8784cb242909ad19fe210969705f19f393c

See more details on using hashes here.

File details

Details for the file llmtoolset-0.4-py3-none-any.whl.

File metadata

  • Download URL: llmtoolset-0.4-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llmtoolset-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a327a7ef66d7a0ffabf54f53479944dca0543c53ecaeaab5d93434fa06f0f832
MD5 accd862b341f3f373e22d5df8bca6dcd
BLAKE2b-256 c669f351afb5c2a55e9b0e68b5d6b0a3eda37e9c81f2f55a688a71cdf865594b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page