Package for creating rank-based semantic and contextual embeddings.
Project description
semantic-embeddings
semantic-embeddings is the official implementation of GRaCE algorithm for generating rank-based semantic and contextual embeddings from top-K similarity lists.
This library implements the RaDE and GRaCE algorithms, which use graph-based measures to create interpretable, effective, and unsupervised embeddings for retrieval, clustering, classification, and visualization.
🔍 Overview
Unlike traditional embedding techniques that require raw features or supervised training, this package builds representations entirely from ranked similarity lists (e.g., from a kNN graph or retrieval system). Each embedding dimension corresponds to a "leader" (reference node).
Key benefits:
- Unsupervised: No labels or ground truth needed.
- Explainable: Embedding dimensions are semantically grounded.
- Versatile: Works for text, images, graphs—any domain with top-K similarities.
📦 Installation
pip install semantic-embeddings
Dependencies:
numpytqdm
Requires Python ≥ 3.7.
🧠 Algorithms
RaDE (Rank-based Diffusion Embedding)
- Selects leaders by propagating rank-based affinities through a diffusion process.
GRaCE (Graph and Rank-based Contextual Embeddings)
- Extends RaDE with unsupervised effectiveness estimation (e.g., Reciprocal Density, Accumulated JacMax) and rank correlation measures (e.g., Reciprocal Distance, JacMax).
🛠 Usage
Input Format
Your input must be a .txt file with one ranked list per line (space-separated item IDs):
15 3 8 22 7 9 ...
3 2 11 5 6 ...
...
Each line is a query, and each number is a retrieved item.
RaDE Example
from sembeddings import RaDE
# Initialize
rade = RaDE(rks_path="data/ranked_lists.txt", rks_size_L=20)
# Compute internal structure
rade.fit(num_candidates=1000, num_leaders=128, t=2)
# Get embedding vectors
embeddings = rade.transform()
# Or do both in one call
embeddings = rade.fit_transform(num_candidates=1000, num_leaders=128, t=2)
GRaCE Example
from sembeddings import GRaCE
grace = GRaCE(
rks_path="data/ranked_lists.txt",
top_K=20,
correlation_measure="jacmax", # or "reciprocal"
estimation_measure="reciprocal_density", # or "accjacmax"
alpha=0.95
)
# Compute internal structure
grace.fit(num_leaders=128)
# Get embedding vectors
embeddings = grace.transform()
# Or do both in one call
embeddings = grace.fit_transform(num_leaders=128)
🔬 Example Applications
Retrieval
from sklearn.metrics.pairwise import cosine_similarity
query_idx = 0
sims = cosine_similarity(embeddings)
top_k = sims[query_idx].argsort()[::-1][1:11]
print("Top-10 results for query:", top_k)
Clustering
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5).fit(embeddings)
print(kmeans.labels_)
Classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(embeddings, labels)
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))
Visualization
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
proj = TSNE(n_components=2).fit_transform(embeddings)
plt.scatter(proj[:, 0], proj[:, 1], c=labels, cmap="tab10")
plt.title("2D Visualization of RaDE Embeddings")
plt.show()
📁 Package Structure
semantic-embeddings/
│
├── rade.py # RaDE implementation
├── grace.py # GRaCE implementation
├── utils.py # Ranked list reader
└── measures/
├── qpp.py # Query performance prediction measures (AccJacMax, Reciprocal Density)
└── correlation.py # Rank correlation measures (JacMax, Reciprocal KNN)
📚 Citation
If you use this library in your research, please cite:
GRaCE (Accepted, pending publication)
Almeida, T. C. C., Letício, G. R., Valem, L. P., Freitas, A., Pedronette, D. C. G.
Effective Graph and Rank-based Contextual Embeddings for Textual and Multimedia Data
2025 International Joint Conference on Neural Networks (IJCNN), Rome – Italy.
RaDE
De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R.
RaDE: A Rank-based Graph Embedding Approach
15th International Conference on Computer Vision Theory and Applications (VISAPP), 2020.
RaDE+: A Semantic Rank-based Graph Embedding Algorithm
International Journal of Information Management Data Insights, 2022.
🤝 Contact
- Thiago César Castilho Almeida:
tc.almeida@unesp.br - Lucas Pascotti Valem:
lucaspascottivalem@gmail.com - Daniel Carlos Guimarães Pedronette:
pedronette@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_embeddings-0.1.1.tar.gz.
File metadata
- Download URL: semantic_embeddings-0.1.1.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cc3801a75179a964a8c0425759839f018e103ed886fe97be202f03667c71af4
|
|
| MD5 |
5a263b91c36b9222573b5188afd7c9b4
|
|
| BLAKE2b-256 |
a67f4cd1566826c4cd0b9f2fcf51aa0042c94ba5a4ba638ac5c8013d1ef58706
|
File details
Details for the file semantic_embeddings-0.1.1-py3-none-any.whl.
File metadata
- Download URL: semantic_embeddings-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59c9e2e8f13b5ecf328892337b090dce5973300eee4bff9894a5e4c1bd4286d9
|
|
| MD5 |
1f26468556f753a246e4864f8e112dc2
|
|
| BLAKE2b-256 |
1b3f124fa77c1031b825eeb07176c19ecd8fb6d227db60d0996b6c2b062f22e8
|