Local acceptance-manifold scoring for implicit RLHF preference pair generation
Project description
manifold-scorer
Local acceptance-manifold scoring for implicit RLHF/DPO preference pair generation.
Implementation of the local density scorer from Gerard & Volkova (2026), "Density-Guided Response Optimization."
The idea: posts a community has accepted — upvoted, engaged with, allowed to persist — cluster in coherent, high-density regions of embedding space. That structure encodes community preference without any labels. You fit the scorer on those posts, then use it to rank candidate responses and generate DPO pairs.
Install
pip install manifold-scorer
Usage
1. Embed your community posts
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
# Just your community posts — upvoted comments, replies, whatever they wrote.
# The fact that they exist = they were accepted. No labels needed.
post_embs = model.encode(community_posts, normalize_embeddings=True) # (N, d)
2. Find a good k (optional but recommended)
from manifold import tune_k
result = tune_k(post_embs)
# prints:
# k consistency (rho)
# --------------------------
# 25 0.61
# 50 0.68
# 100 0.71 ←
# 150 0.70
# 200 0.68
best_k = result["best_k"]
3. Fit the scorer
from manifold import ManifoldScorer
scorer = ManifoldScorer(k=best_k)
scorer.fit(post_embs)
4. Generate DPO pairs
# You have prompts and multiple candidate responses per prompt
prompt_embs = model.encode(prompts, normalize_embeddings=True) # (N, d)
candidate_embs = model.encode(candidates, normalize_embeddings=True) # (N, n_cands, d)
pairs = scorer.make_pairs(prompt_embs, candidate_embs, margin_threshold=0.5)
# pairs["mask"] tells you which pairs had strong enough signal
chosen_texts = [candidates[i][pairs["chosen_idx"][i]]
for i in range(N) if pairs["mask"][i]]
rejected_texts = [candidates[i][pairs["rejected_idx"][i]]
for i in range(N) if pairs["mask"][i]]
# feed chosen_texts / rejected_texts into your DPO trainer
5. Score a single candidate
score = scorer.score(prompt_emb, candidate_emb)
# log-density — higher means more aligned with community norms
6. Rank candidates for a prompt
ranked_indices = scorer.rank_candidates(prompt_emb, candidate_embs)
best = candidates[ranked_indices[0]]
7. Save / load
scorer.save("my_community.npz")
scorer = ManifoldScorer.load("my_community.npz")
Citation
@article{gerard2026dgro,
title = {Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals},
author = {Gerard, Patrick and Volkova, Svitlana},
journal = {ACM FAccT},
year = {2026},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file manifold_scorer-0.1.2.tar.gz.
File metadata
- Download URL: manifold_scorer-0.1.2.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16f98fd20ba4ede876e37edf9926d99a61e23d9e4823875ad3ea61938d42325e
|
|
| MD5 |
ac0bfb42310e0dc10f65abb06cc12de6
|
|
| BLAKE2b-256 |
5fa3d47d5ac30c686e8f39e35d55f7d9615cce82ab9be3248d7cd5526c545762
|
File details
Details for the file manifold_scorer-0.1.2-py3-none-any.whl.
File metadata
- Download URL: manifold_scorer-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd315bc0f6e6ceed4e5e58ec51861c9c70fd26a6f89dfd7061455c55703ec73b
|
|
| MD5 |
bbfedf33a5836da7ea41577f606af6ef
|
|
| BLAKE2b-256 |
08943589a5750055beeec6d35ad6dee46a3443fb25a3b2639862276bcf5eabde
|