Local acceptance-manifold scoring for implicit RLHF preference pair generation
Project description
manifold-scorer
Local acceptance-manifold scoring for implicit RLHF/DPO preference pair generation.
Implementation of the local density scorer from Gerard & Volkova (2026), "Density-Guided Response Optimization."
The idea: responses a community accepts cluster in coherent, high-density regions of embedding space. Local density — conditioned on nearby conversation histories — reliably recovers human preference ordering, and can substitute for explicit preference annotations when building DPO training pairs.
Install
pip install manifold-scorer
# if you want the built-in Embedder helper too:
pip install "manifold-scorer[embed]"
Usage
1. Fit on accepted community responses
from manifold import ManifoldScorer
# hist_embs: embeddings of conversation histories (N, d)
# resp_embs: embeddings of accepted responses (N, d)
# — these are your unlabeled community posts/replies
scorer = ManifoldScorer(k=150)
scorer.fit(hist_embs, resp_embs)
2. Generate DPO pairs from candidates
# For each query you have multiple candidate responses
# cand_embs: shape (N_queries, n_candidates, d)
pairs = scorer.make_pairs(query_hist_embs, cand_embs)
# pairs["chosen_emb"] — shape (N, d), highest-density candidate
# pairs["rejected_emb"] — shape (N, d), lowest-density candidate
# pairs["margin"] — score gap; filter on this for data quality
# Feed directly to your DPO trainer:
high_quality = pairs["margin"] > threshold
dpo_chosen = pairs["chosen_emb"][high_quality]
dpo_rejected = pairs["rejected_emb"][high_quality]
3. Score a single candidate
score = scorer.score(history_emb, candidate_emb)
# log-density; higher = more aligned with community norms
4. Rank candidates for a query
ranked_indices = scorer.rank_candidates(history_emb, candidates_emb)
best = candidates_emb[ranked_indices[0]]
5. Save / load
scorer.save("my_community_scorer.npz")
scorer2 = ManifoldScorer.load("my_community_scorer.npz")
Embedding your text
The scorer is embedding-agnostic — pass any float32 arrays. For a quick start with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
hist_embs = model.encode(histories, normalize_embeddings=True)
resp_embs = model.encode(responses, normalize_embeddings=True)
Citation
@article{gerard2026dgro,
title = {Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals},
author = {Gerard, Patrick and Volkova, Svitlana},
journal = {ACM FAccT},
year = {2026},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file manifold_scorer-0.1.0.tar.gz.
File metadata
- Download URL: manifold_scorer-0.1.0.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
039e6cf5721165d240a5cc251cef8116104b9301354e95cb1aeba5b7d764958f
|
|
| MD5 |
d3e18eb36a7cd76c1ab04c79d80e1a3a
|
|
| BLAKE2b-256 |
795c5969afcaf515217e38b6ea23a6a9198374046d1505ea83a815aa1cdff7b8
|
File details
Details for the file manifold_scorer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: manifold_scorer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e48a95952d0655abf2eadfafc7460c5e883e054c5a53365d3238af32688de15
|
|
| MD5 |
01594b19c3abce59ca45ea579d4ece59
|
|
| BLAKE2b-256 |
4565e4814d775e41aa5efd828a67c2a99b68dc04e52c810c5dbc21c41ef7d493
|