Skip to main content

Package for creating rank-based interpretable and contextual embeddings.

Project description

interpretable-embeddings

PyPI version License GitHub Repo

interpretable-embeddings is the official implementation of GRaCE algorithm for generating rank-based interpretable and contextual embeddings from top-K similarity lists.

This package implements the RaDE and GRaCE algorithms, which use graph-based measures to create interpretable, effective, and unsupervised embeddings for retrieval, clustering, classification, and visualization.


🔍 Overview

Unlike traditional embedding techniques that require raw features or supervised training, this package builds representations entirely from ranked similarity lists (e.g., from a kNN graph or retrieval system). Each embedding dimension corresponds to a similarity measure with the "leaders" (reference node).

Key benefits:

  • Unsupervised: No labels or ground truth needed.
  • Interpretable: Embedding dimensions are human-understandable.
  • Versatile: Works for text, images, graphs—any domain with top-K similarities.

📦 Installation

pip install interpretable-embeddings

Dependencies:

  • numpy
  • tqdm

Requires Python ≥ 3.7.


🧠 Algorithms

RaDE (Rank-based Diffusion Embedding)

  • Selects leaders by propagating rank-based affinities through a diffusion process.

GRaCE (Graph and Rank-based Contextual Embeddings)

  • Extends RaDE with unsupervised effectiveness estimation (e.g., Reciprocal Density, Accumulated JacMax) and rank correlation measures (e.g., Reciprocal Distance, JacMax).

🛠 Usage

Input Format

Your input must be a .txt file with one ranked list per line (space-separated item IDs):

15 3 8 22 7 9 ...
3 2 11 5 6 ...
...

Each line is a query, and each number is a retrieved item.


RaDE Example

from interpretable_embeddings import RaDE

# Initialize
rade = RaDE(rks_path="data/ranked_lists.txt", rks_size_L=20)

# Compute internal structure
rade.fit(num_candidates=1000, num_leaders=128, t=2)

# Get embedding vectors
embeddings = rade.transform()

# Or do both in one call
embeddings = rade.fit_transform(num_candidates=1000, num_leaders=128, t=2)

GRaCE Example

from interpretable_embeddings import GRaCE

grace = GRaCE(
    rks_path="data/ranked_lists.txt",
    top_K=20,
    correlation_measure="jacmax",  # or "reciprocal"
    estimation_measure="reciprocal_density",  # or "accjacmax"
    alpha=0.95
)

# Compute internal structure
grace.fit(num_leaders=128)

# Get embedding vectors
embeddings = grace.transform()

# Or do both in one call
embeddings = grace.fit_transform(num_leaders=128)

🔬 Example Applications

Retrieval

from sklearn.metrics.pairwise import cosine_similarity

query_idx = 0
sims = cosine_similarity(embeddings)
top_k = sims[query_idx].argsort()[::-1][1:11]
print("Top-10 results for query:", top_k)

Clustering

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5).fit(embeddings)
print(kmeans.labels_)

Classification

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(embeddings, labels)
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))

Visualization

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

proj = TSNE(n_components=2).fit_transform(embeddings)
plt.scatter(proj[:, 0], proj[:, 1], c=labels, cmap="tab10")
plt.title("2D Visualization of RaDE Embeddings")
plt.show()

📁 Package Structure

interpretable-embeddings/
│
├── rade.py                 # RaDE implementation
├── grace.py                # GRaCE implementation
├── utils.py                # Ranked list reader
└── measures/
    ├── qpp.py              # Query performance prediction measures (AccJacMax, Reciprocal Density)
    └── correlation.py      # Rank correlation measures (JacMax, Reciprocal KNN)


📚 Citation

If you use this package in your research, please cite:

GRaCE

Almeida, T. C. C., Letício, G. R., Valem, L. P., Freitas, A., Pedronette, D. C. G. Effective Graph and Rank-based Contextual Embeddings for Textual and Multimedia Data 2025 International Joint Conference on Neural Networks (IJCNN), Rome – Italy. GRaCE


RaDE

De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R. RaDE: A Rank-based Graph Embedding Approach 15th International Conference on Computer Vision Theory and Applications (VISAPP), 2020. RaDE


🤝 Contact

  • Thiago César Castilho Almeida: tc.almeida@unesp.br
  • Lucas Pascotti Valem: lucaspascottivalem@gmail.com
  • Daniel Carlos Guimarães Pedronette: pedronette@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpretable_embeddings-1.0.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

interpretable_embeddings-1.0.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file interpretable_embeddings-1.0.1.tar.gz.

File metadata

  • Download URL: interpretable_embeddings-1.0.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for interpretable_embeddings-1.0.1.tar.gz
Algorithm Hash digest
SHA256 218b268273d301d5becee3855fce4059ca33bf8c313b29b9690ae0dcf20f98b1
MD5 6e5b1fa5d7cc37000a19af132f2a2001
BLAKE2b-256 67f11768ce39b3011269f227d8d9ea3e006e782aaafb5d712b6c99e43c7a710e

See more details on using hashes here.

File details

Details for the file interpretable_embeddings-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for interpretable_embeddings-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 539213bfd7f108af3be40d76adfc5e9c4b585a3287d9b2569c4f890186879e7b
MD5 e5f19a5c90d427226c2adc0e5dcf8aca
BLAKE2b-256 cf82186976f65b012c04ee4b13bf3985cb9e8a7a9f3e0f303a45fdd9172f0880

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page