Package for creating rank-based interpretable and contextual embeddings.
Project description
interpretable-embeddings
interpretable-embeddings is the official implementation of RaDE, RaDE+, and GRaCE algorithms for generating rank-based interpretable and contextual embeddings from top-K similarity lists.
This package implements the RaDE, RaDE+, and GRaCE algorithms, which use graph-based measures to create interpretable, effective, and unsupervised embeddings for retrieval, clustering, classification, and visualization.
🔍 Overview
Unlike traditional embedding techniques that require raw features or supervised training, this package builds representations entirely from ranked similarity lists (e.g., from a kNN graph or retrieval system). Each embedding dimension corresponds to a similarity measure with the "leaders" (reference node).
Key benefits:
- Unsupervised: No labels or ground truth needed.
- Interpretable: Embedding dimensions are human-understandable.
- Versatile: Works for text, images, graphs—any domain with top-K similarities.
📦 Installation
pip install interpretable-embeddings
Dependencies:
numpytqdm
Requires Python ≥ 3.7.
🧠 Algorithms
RaDE (Rank-based Diffusion Embedding)
- Selects leaders by propagating rank-based affinities through a diffusion process. Each embedding dimension encodes the affinity to a single representative node.
RaDE+ (Multi-Representative RaDE)
- Extends RaDE by expanding each representative node into an expansion set of similar nodes (Algorithm 1 from the paper). Each embedding dimension aggregates affinities over the expansion set with linearly decreasing weights, producing more robust and diversified representations.
GRaCE (Graph and Rank-based Contextual Embeddings)
- Extends RaDE with unsupervised effectiveness estimation (e.g., Reciprocal Density, Accumulated JacMax) and rank correlation measures (e.g., Reciprocal Distance, JacMax).
🛠 Usage
Input Format
Your input must be a .txt file with one ranked list per line (space-separated item IDs):
15 3 8 22 7 9 ...
3 2 11 5 6 ...
...
Each line is a query, and each number is a retrieved item.
RaDE Example
from interpretable_embeddings import RaDE
# Initialize
rade = RaDE(rks_path="data/ranked_lists.txt", rks_size_L=20)
# Compute internal structure
rade.fit(num_candidates=1000, num_leaders=128, t=2)
# Get embedding vectors
embeddings = rade.transform()
# Or do both in one call
embeddings = rade.fit_transform(num_candidates=1000, num_leaders=128, t=2)
RaDE+ Example
from interpretable_embeddings import RaDEPlus
rade_plus = RaDEPlus(rks_path="data/ranked_lists.txt", rks_size_L=20)
# Compute internal structure
rade_plus.fit(num_candidates=1000, num_leaders=128, t=2, m=3)
# Get embedding vectors
embeddings = rade_plus.transform()
# Or do both in one call
embeddings = rade_plus.fit_transform(num_candidates=1000, num_leaders=128, t=2, m=3)
Parameters:
num_candidates: size of the candidate pool (top-k nodes by reciprocal affinity).num_leaders: embedding dimensionality (number of representative nodes).t: diffusion steps for the transition matrixA = W^t.m: number of nodes added to each leader's expansion set. Constraint:m * num_leaders ≤ num_candidates.
GRaCE Example
from interpretable_embeddings import GRaCE
grace = GRaCE(
rks_path="data/ranked_lists.txt",
top_K=20,
correlation_measure="jacmax", # or "reciprocal"
estimation_measure="reciprocal_density", # or "accjacmax"
alpha=0.95
)
# Compute internal structure
grace.fit(num_leaders=128)
# Get embedding vectors
embeddings = grace.transform()
# Or do both in one call
embeddings = grace.fit_transform(num_leaders=128)
🔬 Example Applications
Retrieval
from sklearn.metrics.pairwise import cosine_similarity
query_idx = 0
sims = cosine_similarity(embeddings)
top_k = sims[query_idx].argsort()[::-1][1:11]
print("Top-10 results for query:", top_k)
Clustering
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5).fit(embeddings)
print(kmeans.labels_)
Classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(embeddings, labels)
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))
Visualization
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
proj = TSNE(n_components=2).fit_transform(embeddings)
plt.scatter(proj[:, 0], proj[:, 1], c=labels, cmap="tab10")
plt.title("2D Visualization of RaDE Embeddings")
plt.show()
📁 Package Structure
interpretable-embeddings/
│
├── rade.py # RaDE implementation
├── rade_plus.py # RaDE+ implementation (multi-representative)
├── grace.py # GRaCE implementation
├── utils.py # Ranked list reader
└── measures/
├── qpp.py # Query performance prediction measures (AccJacMax, Reciprocal Density)
└── correlation.py # Rank correlation measures (JacMax, Reciprocal KNN)
📚 Citation
If you use this package in your research, please cite:
GRaCE
Almeida, T. C. C., Letício, G. R., Valem, L. P., Freitas, A., Pedronette, D. C. G. Effective Graph and Rank-based Contextual Embeddings for Textual and Multimedia Data 2025 International Joint Conference on Neural Networks (IJCNN), Rome – Italy.
RaDE / RaDE+
De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R. RaDE+: A semantic rank-based graph embedding algorithm International Journal of Information Management Data Insights, 2(2), 100078, 2022.
De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R. RaDE: A Rank-based Graph Embedding Approach 15th International Conference on Computer Vision Theory and Applications (VISAPP), 2020.
🤝 Contact
- Thiago César Castilho Almeida:
tc.almeida@unesp.br - Lucas Pascotti Valem:
lucaspascottivalem@gmail.com - Daniel Carlos Guimarães Pedronette:
pedronette@gmail.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file interpretable_embeddings-1.1.1.tar.gz.
File metadata
- Download URL: interpretable_embeddings-1.1.1.tar.gz
- Upload date:
- Size: 20.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
129ba47d86e39f65e8778c8001e0619dae726e19c45006e14261389fdf4bc680
|
|
| MD5 |
4d1e16c6c143517fd5a08d0a538917ec
|
|
| BLAKE2b-256 |
aa33f7d8b05e062fa44668f09b4a3384bfc76a6a4a18e3dcab67da347b3d411c
|
File details
Details for the file interpretable_embeddings-1.1.1-py3-none-any.whl.
File metadata
- Download URL: interpretable_embeddings-1.1.1-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eef6da23c01890def743d50053a5dd92b8ed96c5d5c23fc2f65c74f2db3bf7d9
|
|
| MD5 |
1d395d0f3bfb1c8eba1d46957ce7e85a
|
|
| BLAKE2b-256 |
7fc132f5684f0c579d281a6759301ee98e6c8ed09d859cb2a19ea5ee37cc427d
|