A corruption robustness benchmark for multi-LLM committees with hierarchical aggregation
Project description
Equitas
Corruption-Robust Aggregation for Multi-LLM Governance Committees
A benchmark for evaluating aggregation strategies in hierarchical multi-LLM committees under adversarial corruption.
Quick Start
pip install equitas-benchmark # from PyPI
# or for local development:
pip install -e .
python -m equitas --config configs/governance_sweep_fh.yaml
Aggregation Methods (8 baselines + oracle)
| Method | Key Idea |
|---|---|
| Oracle | Hindsight-optimal action (upper bound) |
| Multiplicative Weights | w *= exp(-eta * loss), adapts to corruption |
| Supervisor Rerank | Follow-the-leader: re-rank by best recent agent |
| Confidence-Weighted | Weight by self-reported confidence |
| EMA Trust | Exponential moving average of past performance |
| Trimmed Vote | Drop outlier agents, then majority |
| Majority Vote | Equal-weight plurality |
| Oracle Upper Bound | Best single agent in hindsight |
| Random Dictator | Uniformly random agent each round |
Experiments
- Corruption sweep: rate x adversary type x aggregator
- Pareto sweep: welfare-fairness tradeoff via (alpha, beta)
- Recovery: mid-run corruption onset, track MW weight recovery
- Scaling: committee size N in {3, 5, 7, 10}
- Hierarchical vs flat: architecture comparison
Reproducibility
Raw experiment outputs in outputs/ include historical runs with all methods
tested during development (including self_consistency). The reported benchmark
results exclude self_consistency at the analysis layer: table-generation
scripts (scripts/generate_benchmark_tables.py, scripts/generate_go_vs_fh_tables.py)
and figure-generation (regenerate_figures.py) filter it out on read. The
self_consistency aggregator is also hard-disabled in the codebase
(equitas/config.py raises ValueError if used) because it implements a
committee-level subsampled majority vote, not canonical within-agent
self-consistency sampling. See the future-work discussion in the paper.
To regenerate all artifacts from raw data:
python scripts/generate_benchmark_tables.py # tables/benchmark/
python scripts/generate_go_vs_fh_tables.py # tables/
python regenerate_figures.py # paper/figures/
python -m pytest tests/ -q # 88 tests
Project Structure
equitas/ # pip-installable package
agents/ # LLM client, member/leader/judge/governor agents
aggregators/ # 8 aggregation strategies (registry pattern)
adversaries/ # 4 adversary types (selfish, coordinated, scheduled, deceptive)
metrics/ # fairness, welfare, Pareto, robust statistics
simulation/ # hierarchical + flat engine
experiments/ # sweep, recovery, scaling, pareto, hier-vs-flat
plotting/ # paper-quality matplotlib figures
configs/ # YAML experiment configs
scripts/ # table generation, analysis
paper/ # LaTeX source + figures
tests/ # 88 unit + integration tests
Links
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file equitas_benchmark-0.1.1.tar.gz.
File metadata
- Download URL: equitas_benchmark-0.1.1.tar.gz
- Upload date:
- Size: 80.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
291db42d997930691f2a9e768cb9bef5bdf91d8d2a91dcf486627d3554642c4b
|
|
| MD5 |
e84af6431c49f27d457718546ecc1f3d
|
|
| BLAKE2b-256 |
879958063e7502ecc03707c2900b3060d669d68c04a079ff7a2d58eb48e0f11d
|
File details
Details for the file equitas_benchmark-0.1.1-py3-none-any.whl.
File metadata
- Download URL: equitas_benchmark-0.1.1-py3-none-any.whl
- Upload date:
- Size: 109.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28fefecaff5aceb2f643b5d56359a6193bf49e2ee7945ca66c8605a068464570
|
|
| MD5 |
a310b8c9f916257bae7da43c504f2404
|
|
| BLAKE2b-256 |
037e12f393a919df40682d18bc1ac13b4829bed43259133ecab39aa08aaa83f9
|