Skip to main content

Package for creating rank-based interpretable and contextual embeddings.

Project description

interpretable-embeddings

PyPI version License GitHub Repo

interpretable-embeddings is the official implementation of RaDE, RaDE+, and GRaCE algorithms for generating rank-based interpretable and contextual embeddings from top-K similarity lists.

This package implements the RaDE, RaDE+, and GRaCE algorithms, which use graph-based measures to create interpretable, effective, and unsupervised embeddings for retrieval, clustering, classification, and visualization.


🔍 Overview

Unlike traditional embedding techniques that require raw features or supervised training, this package builds representations entirely from ranked similarity lists (e.g., from a kNN graph or retrieval system). Each embedding dimension corresponds to a similarity measure with the "leaders" (reference node).

Key benefits:

  • Unsupervised: No labels or ground truth needed.
  • Interpretable: Embedding dimensions are human-understandable.
  • Versatile: Works for text, images, graphs—any domain with top-K similarities.

📦 Installation

pip install interpretable-embeddings

Dependencies:

  • numpy
  • tqdm

Requires Python ≥ 3.7.


🧠 Algorithms

RaDE (Rank-based Diffusion Embedding)

  • Selects leaders by propagating rank-based affinities through a diffusion process. Each embedding dimension encodes the affinity to a single representative node.

RaDE+ (Multi-Representative RaDE)

  • Extends RaDE by expanding each representative node into an expansion set of similar nodes (Algorithm 1 from the paper). Each embedding dimension aggregates affinities over the expansion set with linearly decreasing weights, producing more robust and diversified representations.

GRaCE (Graph and Rank-based Contextual Embeddings)

  • Extends RaDE with unsupervised effectiveness estimation (e.g., Reciprocal Density, Accumulated JacMax) and rank correlation measures (e.g., Reciprocal Distance, JacMax).

🛠 Usage

Input Format

Your input must be a .txt file with one ranked list per line (space-separated item IDs):

15 3 8 22 7 9 ...
3 2 11 5 6 ...
...

Each line is a query, and each number is a retrieved item.


RaDE Example

from interpretable_embeddings import RaDE

# Initialize
rade = RaDE(rks_path="data/ranked_lists.txt", rks_size_L=20)

# Compute internal structure
rade.fit(num_candidates=1000, num_leaders=128, t=2)

# Get embedding vectors
embeddings = rade.transform()

# Or do both in one call
embeddings = rade.fit_transform(num_candidates=1000, num_leaders=128, t=2)

RaDE+ Example

from interpretable_embeddings import RaDEPlus

rade_plus = RaDEPlus(rks_path="data/ranked_lists.txt", rks_size_L=20)

# Compute internal structure
rade_plus.fit(num_candidates=1000, num_leaders=128, t=2, m=3)

# Get embedding vectors
embeddings = rade_plus.transform()

# Or do both in one call
embeddings = rade_plus.fit_transform(num_candidates=1000, num_leaders=128, t=2, m=3)

Parameters:

  • num_candidates: size of the candidate pool (top-k nodes by reciprocal affinity).
  • num_leaders: embedding dimensionality (number of representative nodes).
  • t: diffusion steps for the transition matrix A = W^t.
  • m: number of nodes added to each leader's expansion set. Constraint: m * num_leaders ≤ num_candidates.

GRaCE Example

from interpretable_embeddings import GRaCE

grace = GRaCE(
    rks_path="data/ranked_lists.txt",
    top_K=20,
    correlation_measure="jacmax",  # or "reciprocal"
    estimation_measure="reciprocal_density",  # or "accjacmax"
    alpha=0.95
)

# Compute internal structure
grace.fit(num_leaders=128)

# Get embedding vectors
embeddings = grace.transform()

# Or do both in one call
embeddings = grace.fit_transform(num_leaders=128)

🔬 Example Applications

Retrieval

from sklearn.metrics.pairwise import cosine_similarity

query_idx = 0
sims = cosine_similarity(embeddings)
top_k = sims[query_idx].argsort()[::-1][1:11]
print("Top-10 results for query:", top_k)

Clustering

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5).fit(embeddings)
print(kmeans.labels_)

Classification

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(embeddings, labels)
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))

Visualization

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

proj = TSNE(n_components=2).fit_transform(embeddings)
plt.scatter(proj[:, 0], proj[:, 1], c=labels, cmap="tab10")
plt.title("2D Visualization of RaDE Embeddings")
plt.show()

📁 Package Structure

interpretable-embeddings/
│
├── rade.py                 # RaDE implementation
├── rade_plus.py            # RaDE+ implementation (multi-representative)
├── grace.py                # GRaCE implementation
├── utils.py                # Ranked list reader
└── measures/
    ├── qpp.py              # Query performance prediction measures (AccJacMax, Reciprocal Density)
    └── correlation.py      # Rank correlation measures (JacMax, Reciprocal KNN)


📚 Citation

If you use this package in your research, please cite:

GRaCE

Almeida, T. C. C., Letício, G. R., Valem, L. P., Freitas, A., Pedronette, D. C. G. Effective Graph and Rank-based Contextual Embeddings for Textual and Multimedia Data 2025 International Joint Conference on Neural Networks (IJCNN), Rome – Italy. GRaCE


RaDE / RaDE+

De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R. RaDE+: A semantic rank-based graph embedding algorithm International Journal of Information Management Data Insights, 2(2), 100078, 2022. RaDE+

De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R. RaDE: A Rank-based Graph Embedding Approach 15th International Conference on Computer Vision Theory and Applications (VISAPP), 2020. RaDE


🤝 Contact

  • Thiago César Castilho Almeida: tc.almeida@unesp.br
  • Lucas Pascotti Valem: lucaspascottivalem@gmail.com
  • Daniel Carlos Guimarães Pedronette: pedronette@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpretable_embeddings-1.1.1.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

interpretable_embeddings-1.1.1-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file interpretable_embeddings-1.1.1.tar.gz.

File metadata

  • Download URL: interpretable_embeddings-1.1.1.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for interpretable_embeddings-1.1.1.tar.gz
Algorithm Hash digest
SHA256 129ba47d86e39f65e8778c8001e0619dae726e19c45006e14261389fdf4bc680
MD5 4d1e16c6c143517fd5a08d0a538917ec
BLAKE2b-256 aa33f7d8b05e062fa44668f09b4a3384bfc76a6a4a18e3dcab67da347b3d411c

See more details on using hashes here.

File details

Details for the file interpretable_embeddings-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for interpretable_embeddings-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eef6da23c01890def743d50053a5dd92b8ed96c5d5c23fc2f65c74f2db3bf7d9
MD5 1d395d0f3bfb1c8eba1d46957ce7e85a
BLAKE2b-256 7fc132f5684f0c579d281a6759301ee98e6c8ed09d859cb2a19ea5ee37cc427d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page