A dictionary-like class that uses semantic similarity for key matching
Project description
Semantic Dictionary
A dictionary-like class that uses semantic similarity for key matching instead of exact matches.
Features
- Drop-in replacement for dict: Implements the complete standard dictionary interface
- Semantic matching: Find keys based on semantic similarity, not just exact matches
- Flexible embedding: Works with various embedding models (sentence-transformers, Hugging Face, OpenAI)
- Type annotations: Fully typed with generics support
Installation
You can install the package from PyPI:
pip install semantic-dictionary
For additional functionality, you can install optional dependencies:
# For sentence-transformers support
pip install semantic-dictionary[sentence-transformers]
# For Hugging Face support
pip install semantic-dictionary[huggingface]
# For OpenAI support
pip install semantic-dictionary[openai]
# For all optional dependencies
pip install semantic-dictionary[all]
Quick Start Guide
Get up and running with SemanticDictionary in just a few steps:
-
Install the package with your preferred embedding model:
pip install semantic-dictionary[sentence-transformers]
-
Create a simple dictionary:
from semantic_dictionary import SemanticDictionary, SentenceTransformerAdapter from sentence_transformers import SentenceTransformer # Initialize the embedding model and adapter model = SentenceTransformer('all-MiniLM-L6-v2') adapter = SentenceTransformerAdapter(model) # Create the semantic dictionary with a similarity threshold sd = SemanticDictionary(adapter, similarity_threshold=0.75) # Add some items sd["customer support"] = "Help desk contact information" sd["product pricing"] = "Current product price list" sd["shipping policy"] = "Information about shipping options"
-
Use semantic lookups:
# These will work even though the keys don't exactly match print(sd["customer help"]) # Returns "Help desk contact information" print(sd["price list"]) # Returns "Current product price list" print(sd["delivery info"]) # Returns "Information about shipping options" # Check if semantically similar keys exist if "refund policy" in sd: print("Refund information found!") else: print("No refund information available")
-
Adjust the similarity threshold as needed for your use case:
# More strict matching (closer to 1.0) strict_sd = SemanticDictionary(adapter, similarity_threshold=0.9) # More lenient matching (closer to 0.0) lenient_sd = SemanticDictionary(adapter, similarity_threshold=0.6)
Documentation
- Examples - Check out various examples, including:
- Basic Example - Simple demonstration of core functionality
- Sentence Transformers Example - Using with sentence-transformers
- Advanced Example - Real-world use cases like FAQ systems, command routing, and more
Error Handling
SemanticDictionary provides exceptions representing common issues:
Key Exceptions
- KeyError: Raised when no semantically similar key is found
- ZeroVectorError: Raised when a zero vector is encountered during similarity calculation
- EmbeddingError: Raised when there's a problem with embedding generation
Handling Errors
from semantic_dictionary import SemanticDictionary, ZeroVectorError, EmbeddingError
# Create your semantic dictionary
sd = SemanticDictionary(embedding_model)
try:
value = sd["some_key"]
except KeyError:
print("No similar key found")
except ZeroVectorError as e:
print(f"Zero vector issue: {e}")
except EmbeddingError as e:
print(f"Embedding error: {e}")
Usage
Basic Usage
from semantic_dictionary import SemanticDictionary, DummyEmbeddingModel
# Create a dummy embedding model for demonstration
embedding_model = DummyEmbeddingModel(dimension=768, seed=42)
# Create a semantic dictionary with a similarity threshold of 0.7
sd = SemanticDictionary(embedding_model, similarity_threshold=0.7)
# Add items to the dictionary
sd["apple"] = "A fruit"
sd["banana"] = "A yellow fruit"
sd["orange"] = "A citrus fruit"
# Retrieve items using semantically similar keys
print(sd["apples"]) # Output: "A fruit"
print(sd["citrus"]) # Output: "A citrus fruit"
print(sd["yellow"]) # Output: KeyError (if similarity is below threshold)
# Check if a key exists
print("apples" in sd) # Output: True (if similarity is above threshold)
print("car" in sd) # Output: False
# Get a value with a default
print(sd.get("apples", "Not found")) # Output: "A fruit"
print(sd.get("car", "Not found")) # Output: "Not found"
Semantic vs. Standard Dictionary Behavior
It's important to understand when semantic similarity is used versus standard dictionary behavior:
Operations Using Semantic Similarity
- Key lookups:
sd[key],sd.get(key),key in sd - Key deletion:
del sd[key],sd.pop(key) - Key checking:
key in sd - Key finding:
sd.setdefault(key, default)
Operations Using Standard Dictionary Behavior
- Dictionary merging:
sd | other_dict,sd |= other_dict,sd.update(other_dict) - Dictionary comparison:
sd == other_dict,sd != other_dict,sd < other_dict, etc. - Dictionary iteration:
sd.keys(),sd.values(),sd.items(),iter(sd)
This distinction is crucial to understand when working with SemanticDictionary, as it affects how keys are matched and processed.
Using with Sentence Transformers
from semantic_dictionary import SemanticDictionary, SentenceTransformerAdapter
from sentence_transformers import SentenceTransformer
# Load a sentence-transformers model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Create an adapter for the model
adapter = SentenceTransformerAdapter(model)
# Create a semantic dictionary with the adapter
sd = SemanticDictionary(adapter, similarity_threshold=0.7)
# Use the dictionary as before
sd["apple"] = "A fruit"
sd["banana"] = "A yellow fruit"
sd["orange"] = "A citrus fruit"
print(sd["fruit"]) # Will match the most similar key
Using with Hugging Face Transformers
from semantic_dictionary import SemanticDictionary, HuggingFaceAdapter
from transformers import AutoTokenizer, AutoModel
# Load a Hugging Face model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')
# Create an adapter for the model
adapter = HuggingFaceAdapter(tokenizer, model, pooling_strategy='mean')
# Create a semantic dictionary with the adapter
sd = SemanticDictionary(adapter, similarity_threshold=0.7)
# Use the dictionary as before
sd["apple"] = "A fruit"
sd["banana"] = "A yellow fruit"
sd["orange"] = "A citrus fruit"
print(sd["fruit"]) # Will match the most similar key
Using with OpenAI
from semantic_dictionary import SemanticDictionary, OpenAIAdapter
from openai import OpenAI
# Create an OpenAI client
client = OpenAI(api_key="your-api-key")
# Create an adapter for OpenAI
adapter = OpenAIAdapter(client, model="text-embedding-3-small")
# Create a semantic dictionary with the adapter
sd = SemanticDictionary(adapter, similarity_threshold=0.7)
# Use the dictionary as before
sd["apple"] = "A fruit"
sd["banana"] = "A yellow fruit"
sd["orange"] = "A citrus fruit"
print(sd["fruit"]) # Will match the most similar key
Getting Help
If you encounter issues, please open an issue on the GitHub repository with a minimal reproducible example.
Development
Setting Up Development Environment
-
Clone the repository:
git clone https://github.com/eu90h/semantic-dictionary.git cd semantic-dictionary
-
Install development dependencies:
pip install -e ".[all]" pip install -r requirements-dev.txt
Running Tests
pytest
Code Formatting
black semantic_dictionary tests examples
isort semantic_dictionary tests examples
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_dictionary-0.1.0.tar.gz.
File metadata
- Download URL: semantic_dictionary-0.1.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d9abfa393bb685a7d92304f0ccd9529418d56c7adf0287874ad4dfbc3b7d854
|
|
| MD5 |
997e44bc9d0db694cde4729bacf03e3b
|
|
| BLAKE2b-256 |
1108e9b452312f451f614386a56c357dd475bf51147a6588893a0cd17d44ff85
|
Provenance
The following attestation bundles were made for semantic_dictionary-0.1.0.tar.gz:
Publisher:
publish-to-pypi.yml on eu90h/semantic-dictionary
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semantic_dictionary-0.1.0.tar.gz -
Subject digest:
7d9abfa393bb685a7d92304f0ccd9529418d56c7adf0287874ad4dfbc3b7d854 - Sigstore transparency entry: 255856431
- Sigstore integration time:
-
Permalink:
eu90h/semantic-dictionary@1b4e792ddcd5bb1cfd3c3821b74b7c35c9871dfd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/eu90h
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@1b4e792ddcd5bb1cfd3c3821b74b7c35c9871dfd -
Trigger Event:
push
-
Statement type:
File details
Details for the file semantic_dictionary-0.1.0-py3-none-any.whl.
File metadata
- Download URL: semantic_dictionary-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22b9bfcfbda1805bcd4d3c41404fc44a64fc3aa43d3b136b35198522466f6397
|
|
| MD5 |
1e2ede7b9aec9f5848bda7f59cfddffc
|
|
| BLAKE2b-256 |
fab28b504f62618d66fca464431b69c5541303b31ffe233d1a43d6ba7275adcb
|
Provenance
The following attestation bundles were made for semantic_dictionary-0.1.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on eu90h/semantic-dictionary
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semantic_dictionary-0.1.0-py3-none-any.whl -
Subject digest:
22b9bfcfbda1805bcd4d3c41404fc44a64fc3aa43d3b136b35198522466f6397 - Sigstore transparency entry: 255856432
- Sigstore integration time:
-
Permalink:
eu90h/semantic-dictionary@1b4e792ddcd5bb1cfd3c3821b74b7c35c9871dfd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/eu90h
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@1b4e792ddcd5bb1cfd3c3821b74b7c35c9871dfd -
Trigger Event:
push
-
Statement type: