Skip to main content

PEARL: Prototype-guided Embedding Refinement via Adaptive Representation Learning

Project description

PEARL (pearl-H)

PEARL (Prototype-Enhanced Alignment for Label-efficient Representation Learning) is a lightweight, label-efficient post-processing method for refining fixed embeddings (e.g., sentence/document embeddings) to improve local neighborhood geometry for similarity-driven systems such as kNN retrieval, case-based routing, and embedding-based classifiers.

This package implements a practical PEARL workflow:

  • Signal extraction: learns a small refinement network to separate class-discriminative signal from residual variation while preserving the original embedding dimensionality.
  • Prototype-augmented features (PAF): fits per-class prototypes (KMeans) and augments embeddings with prototype/centroid similarity features (useful for downstream lightweight models).

Installation

pip install pearl-H

Quickstart (recommended)

PEARL assumes you already have embeddings X from a fixed encoder. You provide a small labeled subset (X_train, y_train) to fit the refinement, then transform any embeddings for retrieval/classification.

import numpy as np
from pearl import PEARLPipeline

# X_train: [N, D] numpy array of embeddings
# y_train: [N] integer labels in [0, n_classes)
pipeline = PEARLPipeline(n_classes=10, device="auto")

pipeline.fit(X_train, y_train, X_val=X_val, y_val=y_val, epochs=100, patience=20)

# Choose the output you want:
X_enhanced = pipeline.transform(X_test, mode="enhanced")  # same dim as input
X_paf = pipeline.transform(X_test, mode="paf")            # augmented with prototype features

Core API

  • PEARLPipeline: end-to-end training + transformation (fit, transform, fit_transform).
  • SignalExtractorTrainer: trains the refinement model; produces same-dimensional enhanced embeddings.
  • PAFAugmentor: appends prototype/centroid similarity features to embeddings.
  • RAGClassifierWrapper: retrieval-augmented classifier over embeddings (kNN retrieval + cross-attention).

Input conventions

  • Embeddings: numpy.ndarray of shape [N, D] (float32/float64).
  • Labels: numpy.ndarray of shape [N] with integer class ids 0..n_classes-1.
  • Device: "auto", "cuda", "mps", "cpu" (or a torch.device).

Paper & citation

If you use PEARL in academic work, please cite the paper:

@misc{zhang2026pearlprototypeenhancedalignmentlabelefficient,
      title={PEARL: Prototype-Enhanced Alignment for Label-Efficient Representation Learning with Deployment-Driven Insights from Digital Governance Communication Systems},
      author={Ruiyu Zhang and Lin Nie and Wai-Fung Lam and Qihao Wang and Xin Zhao},
      year={2026},
      eprint={2601.17495},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.17495},
}

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pearl_h-0.1.3.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pearl_h-0.1.3-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file pearl_h-0.1.3.tar.gz.

File metadata

  • Download URL: pearl_h-0.1.3.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for pearl_h-0.1.3.tar.gz
Algorithm Hash digest
SHA256 bdca118320965f27decbd415d7ed248ae759559d2ffe9f7d4ead5309ec8ace06
MD5 2e23112b1de4eea44d55a5c0a0c74b7c
BLAKE2b-256 e4624f474d5c3ec9a551604b7edaf73f422de98e96dfda75327a82d6ab06e8c4

See more details on using hashes here.

File details

Details for the file pearl_h-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pearl_h-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for pearl_h-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 10525d33e28a5475354cb1ba10656598e5223965642cec4c935f1220a0ca0d87
MD5 25d690209695513b1e281b3244852b78
BLAKE2b-256 63a479fb9fa8184965260ba4a644ce52624cf6b4b5bee060d0b6ec4fcb085d2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page