Skip to main content

Package for creating rank-based interpretable and contextual embeddings.

Project description

interpretable-embeddings

PyPI version License GitHub Repo

interpretable-embeddings is the official implementation of GRaCE algorithm for generating rank-based interpretable and contextual embeddings from top-K similarity lists.

This package implements the RaDE and GRaCE algorithms, which use graph-based measures to create interpretable, effective, and unsupervised embeddings for retrieval, clustering, classification, and visualization.


🔍 Overview

Unlike traditional embedding techniques that require raw features or supervised training, this package builds representations entirely from ranked similarity lists (e.g., from a kNN graph or retrieval system). Each embedding dimension corresponds to a similarity measure with the "leaders" (reference node).

Key benefits:

  • Unsupervised: No labels or ground truth needed.
  • Interpretable: Embedding dimensions are human-understandable.
  • Versatile: Works for text, images, graphs—any domain with top-K similarities.

📦 Installation

pip install interpretable-embeddings

Dependencies:

  • numpy
  • tqdm

Requires Python ≥ 3.7.


🧠 Algorithms

RaDE (Rank-based Diffusion Embedding)

  • Selects leaders by propagating rank-based affinities through a diffusion process.

GRaCE (Graph and Rank-based Contextual Embeddings)

  • Extends RaDE with unsupervised effectiveness estimation (e.g., Reciprocal Density, Accumulated JacMax) and rank correlation measures (e.g., Reciprocal Distance, JacMax).

🛠 Usage

Input Format

Your input must be a .txt file with one ranked list per line (space-separated item IDs):

15 3 8 22 7 9 ...
3 2 11 5 6 ...
...

Each line is a query, and each number is a retrieved item.


RaDE Example

from interpretable_embeddings import RaDE

# Initialize
rade = RaDE(rks_path="data/ranked_lists.txt", rks_size_L=20)

# Compute internal structure
rade.fit(num_candidates=1000, num_leaders=128, t=2)

# Get embedding vectors
embeddings = rade.transform()

# Or do both in one call
embeddings = rade.fit_transform(num_candidates=1000, num_leaders=128, t=2)

GRaCE Example

from interpretable_embeddings import GRaCE

grace = GRaCE(
    rks_path="data/ranked_lists.txt",
    top_K=20,
    correlation_measure="jacmax",  # or "reciprocal"
    estimation_measure="reciprocal_density",  # or "accjacmax"
    alpha=0.95
)

# Compute internal structure
grace.fit(num_leaders=128)

# Get embedding vectors
embeddings = grace.transform()

# Or do both in one call
embeddings = grace.fit_transform(num_leaders=128)

🔬 Example Applications

Retrieval

from sklearn.metrics.pairwise import cosine_similarity

query_idx = 0
sims = cosine_similarity(embeddings)
top_k = sims[query_idx].argsort()[::-1][1:11]
print("Top-10 results for query:", top_k)

Clustering

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5).fit(embeddings)
print(kmeans.labels_)

Classification

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(embeddings, labels)
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))

Visualization

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

proj = TSNE(n_components=2).fit_transform(embeddings)
plt.scatter(proj[:, 0], proj[:, 1], c=labels, cmap="tab10")
plt.title("2D Visualization of RaDE Embeddings")
plt.show()

📁 Package Structure

interpretable-embeddings/
│
├── rade.py                 # RaDE implementation
├── grace.py                # GRaCE implementation
├── utils.py                # Ranked list reader
└── measures/
    ├── qpp.py              # Query performance prediction measures (AccJacMax, Reciprocal Density)
    └── correlation.py      # Rank correlation measures (JacMax, Reciprocal KNN)


📚 Citation

If you use this package in your research, please cite:

GRaCE

Almeida, T. C. C., Letício, G. R., Valem, L. P., Freitas, A., Pedronette, D. C. G. Effective Graph and Rank-based Contextual Embeddings for Textual and Multimedia Data 2025 International Joint Conference on Neural Networks (IJCNN), Rome – Italy. GRaCE


RaDE

De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R. RaDE: A Rank-based Graph Embedding Approach 15th International Conference on Computer Vision Theory and Applications (VISAPP), 2020. RaDE


🤝 Contact

  • Thiago César Castilho Almeida: tc.almeida@unesp.br
  • Lucas Pascotti Valem: lucaspascottivalem@gmail.com
  • Daniel Carlos Guimarães Pedronette: pedronette@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpretable_embeddings-0.0.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

interpretable_embeddings-0.0.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file interpretable_embeddings-0.0.1.tar.gz.

File metadata

  • Download URL: interpretable_embeddings-0.0.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for interpretable_embeddings-0.0.1.tar.gz
Algorithm Hash digest
SHA256 44c4652de512b38f8ef8eff15c569985c038649e568e10809f8a566ed88c594f
MD5 0c5b8e5888bc821bebfd6c1da27ac347
BLAKE2b-256 2903b33a5be56ad6c8fc667cb7439ce4751fb8d9b9515a8a5649b3f86576a888

See more details on using hashes here.

File details

Details for the file interpretable_embeddings-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for interpretable_embeddings-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2dacdc58a52e26b430b6466d77dbc99117969b3aad4301bdbdf34ca484787b3c
MD5 8aa31858927348936b42d051848b1ecf
BLAKE2b-256 4730c1cd2cfafd3a8f3a1daf6810a6ded1704f6c824febd799416a20551c5e9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page