Skip to main content

Package for creating rank-based semantic and contextual embeddings.

Project description

semantic-embeddings

PyPI version License GitHub Repo

semantic-embeddings is the official implementation of GRaCE algorithm for generating rank-based semantic and contextual embeddings from top-K similarity lists.

This library implements the RaDE and GRaCE algorithms, which use graph-based measures to create interpretable, effective, and unsupervised embeddings for retrieval, clustering, classification, and visualization.


🔍 Overview

Unlike traditional embedding techniques that require raw features or supervised training, this package builds representations entirely from ranked similarity lists (e.g., from a kNN graph or retrieval system). Each embedding dimension corresponds to a "leader" (reference node).

Key benefits:

  • Unsupervised: No labels or ground truth needed.
  • Explainable: Embedding dimensions are semantically grounded.
  • Versatile: Works for text, images, graphs—any domain with top-K similarities.

📦 Installation

pip install semantic-embeddings

Dependencies:

  • numpy
  • tqdm

Requires Python ≥ 3.7.


🧠 Algorithms

RaDE (Rank-based Diffusion Embedding)

  • Selects leaders by propagating rank-based affinities through a diffusion process.

GRaCE (Graph and Rank-based Contextual Embeddings)

  • Extends RaDE with unsupervised effectiveness estimation (e.g., Reciprocal Density, Accumulated JacMax) and rank correlation measures (e.g., Reciprocal Distance, JacMax).

🛠 Usage

Input Format

Your input must be a .txt file with one ranked list per line (space-separated item IDs):

15 3 8 22 7 9 ...
3 2 11 5 6 ...
...

Each line is a query, and each number is a retrieved item.


RaDE Example

from sembeddings import RaDE

# Initialize
rade = RaDE(rks_path="data/ranked_lists.txt", rks_size_L=20)

# Compute internal structure
rade.fit(num_candidates=1000, num_leaders=128, t=2)

# Get embedding vectors
embeddings = rade.transform()

# Or do both in one call
embeddings = rade.fit_transform(num_candidates=1000, num_leaders=128, t=2)

GRaCE Example

from sembeddings import GRaCE

grace = GRaCE(
    rks_path="data/ranked_lists.txt",
    top_K=20,
    correlation_measure="jacmax",  # or "reciprocal"
    estimation_measure="reciprocal_density",  # or "accjacmax"
    alpha=0.95
)

# Compute internal structure
grace.fit(num_leaders=128)

# Get embedding vectors
embeddings = grace.transform()

# Or do both in one call
embeddings = grace.fit_transform(num_leaders=128)

🔬 Example Applications

Retrieval

from sklearn.metrics.pairwise import cosine_similarity

query_idx = 0
sims = cosine_similarity(embeddings)
top_k = sims[query_idx].argsort()[::-1][1:11]
print("Top-10 results for query:", top_k)

Clustering

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5).fit(embeddings)
print(kmeans.labels_)

Classification

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(embeddings, labels)
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))

Visualization

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

proj = TSNE(n_components=2).fit_transform(embeddings)
plt.scatter(proj[:, 0], proj[:, 1], c=labels, cmap="tab10")
plt.title("2D Visualization of RaDE Embeddings")
plt.show()

📁 Package Structure

semantic-embeddings/
│
├── rade.py                 # RaDE implementation
├── grace.py                # GRaCE implementation
├── utils.py                # Ranked list reader
└── measures/
    ├── qpp.py              # Query performance prediction measures (AccJacMax, Reciprocal Density)
    └── correlation.py      # Rank correlation measures (JacMax, Reciprocal KNN)


📚 Citation

If you use this library in your research, please cite:

GRaCE (Accepted, pending publication)

Almeida, T. C. C., Letício, G. R., Valem, L. P., Freitas, A., Pedronette, D. C. G.
Effective Graph and Rank-based Contextual Embeddings for Textual and Multimedia Data
2025 International Joint Conference on Neural Networks (IJCNN), Rome – Italy.
View Paper


RaDE

De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R.
RaDE: A Rank-based Graph Embedding Approach
15th International Conference on Computer Vision Theory and Applications (VISAPP), 2020.
RaDE

RaDE+: A Semantic Rank-based Graph Embedding Algorithm
International Journal of Information Management Data Insights, 2022.
RaDE+


🤝 Contact

  • Thiago César Castilho Almeida: tc.almeida@unesp.br
  • Lucas Pascotti Valem: lucaspascottivalem@gmail.com
  • Daniel Carlos Guimarães Pedronette: pedronette@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_embeddings-0.1.1.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_embeddings-0.1.1-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file semantic_embeddings-0.1.1.tar.gz.

File metadata

  • Download URL: semantic_embeddings-0.1.1.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for semantic_embeddings-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5cc3801a75179a964a8c0425759839f018e103ed886fe97be202f03667c71af4
MD5 5a263b91c36b9222573b5188afd7c9b4
BLAKE2b-256 a67f4cd1566826c4cd0b9f2fcf51aa0042c94ba5a4ba638ac5c8013d1ef58706

See more details on using hashes here.

File details

Details for the file semantic_embeddings-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_embeddings-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59c9e2e8f13b5ecf328892337b090dce5973300eee4bff9894a5e4c1bd4286d9
MD5 1f26468556f753a246e4864f8e112dc2
BLAKE2b-256 1b3f124fa77c1031b825eeb07176c19ecd8fb6d227db60d0996b6c2b062f22e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page