Skip to main content

PEARL: Prototype-Enhanced Aligned Representation Learning

Project description

PEARL (pearl-H)

PEARL (Prototype-Enhanced Aligned Representation Learning) is a lightweight, label-efficient post-processing method for refining fixed embeddings (e.g., sentence/document embeddings) to improve local neighborhood geometry for similarity-driven systems such as kNN retrieval, case-based routing, and embedding-based classifiers.

This package implements a practical PEARL workflow:

  • Signal extraction: learns a small refinement network to separate class-discriminative signal from residual variation while preserving the original embedding dimensionality.
  • Prototype-augmented features (PAF): fits per-class prototypes (KMeans) and augments embeddings with prototype/centroid similarity features (useful for downstream lightweight models).

Installation

pip install pearl-H

Quickstart (recommended)

PEARL assumes you already have embeddings X from a fixed encoder. You provide a small labeled subset (X_train, y_train) to fit the refinement, then transform any embeddings for retrieval/classification.

import numpy as np
from pearl import PEARLPipeline

# X_train: [N, D] numpy array of embeddings
# y_train: [N] integer labels in [0, n_classes)
pipeline = PEARLPipeline(n_classes=10, device="auto")

pipeline.fit(X_train, y_train, X_val=X_val, y_val=y_val, epochs=100, patience=20)

# Choose the output you want:
X_enhanced = pipeline.transform(X_test, mode="enhanced")  # same dim as input
X_paf = pipeline.transform(X_test, mode="paf")            # augmented with prototype features

Core API

  • PEARLPipeline: end-to-end training + transformation (fit, transform, fit_transform).
  • SignalExtractorTrainer: trains the refinement model; produces same-dimensional enhanced embeddings.
  • PAFAugmentor: appends prototype/centroid similarity features to embeddings.
  • RAGClassifierWrapper: retrieval-augmented classifier over embeddings (kNN retrieval + cross-attention).

Input conventions

  • Embeddings: numpy.ndarray of shape [N, D] (float32/float64).
  • Labels: numpy.ndarray of shape [N] with integer class ids 0..n_classes-1.
  • Device: "auto", "cuda", "mps", "cpu" (or a torch.device).

Paper & citation

If you use PEARL in academic work, please cite the paper:

@misc{zhang2026pearlprototypeenhancedalignmentlabelefficient,
      title={PEARL: Prototype-Enhanced Alignment for Label-Efficient Representation Learning with Deployment-Driven Insights from Digital Governance Communication Systems},
      author={Ruiyu Zhang and Lin Nie and Wai-Fung Lam and Qihao Wang and Xin Zhao},
      year={2026},
      eprint={2601.17495},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.17495},
}

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pearl_h-0.1.4.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pearl_h-0.1.4-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file pearl_h-0.1.4.tar.gz.

File metadata

  • Download URL: pearl_h-0.1.4.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for pearl_h-0.1.4.tar.gz
Algorithm Hash digest
SHA256 3d91d7c6c723853a73900a3652d8cb54b63f63cc9793c571ea5ce6e95a82d17e
MD5 5b323eb110d07685b2931ca1c3b9a5a8
BLAKE2b-256 b3d568a0c5f94306d8c3f872b801c27a5cd10b1029a364802d258634d3c3ba6c

See more details on using hashes here.

File details

Details for the file pearl_h-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pearl_h-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for pearl_h-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3e6125898cf2068d8c57b3e6d039829ebc0c29da91256156056d289680f2e39e
MD5 075ddf1178eb4917038decebd0d1f2c3
BLAKE2b-256 f3d243188bc101b8f307e311af3fc3a1bfb84353e3ab1955ad761a679c519548

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page