Paper-aligned LLM-only graph construction, benchmark runners, and public-facing evaluation tools for LLM unlearning experiments.
Project description
LLM2Graph
llm2graph is a paper-aligned toolkit for reproducing the graph-based evaluation pipeline from The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning while still being usable for general public workflows.
It supports three layers of use:
- Entity-centric knowledge graph construction using LLM API calls throughout elicitation, triple extraction, relevance filtering, and alias resolution.
- Query generation from graph paths with controllable difficulty through hop depth, aliases, paraphrases, distractors, and retention probes.
- Benchmark and public workflows through simple CLI commands, dataset loaders, and reusable Python APIs.
Why 0.3.4
Version 0.3.4 extends the 0.3.3 paper-aligned pipeline with:
- dataset-specific runners for RWKU and TOFU
- public-friendly dataset loading from
.txt,.json,.jsonl, and.csv - seed extraction utilities for previewing local benchmark files
- reusable Python APIs for both benchmark replication and general entity-level experiments
Install
pip install llm2graph
Optional extras:
pip install "llm2graph[gemini]"
pip install "llm2graph[hf-local]"
Public Quickstart
If you just want to build a graph and evaluate one entity:
$env:OPENAI_API_KEY="..."
llm2graph entity --seed "Marie Curie" --out graph.json
llm2graph gen-queries --graph graph.json --target "Marie Curie" --hops 2 --out queries.json
llm2graph eval --queries queries.json --pre-model gpt-4o-mini-2024-07-18 --post-model gpt-4o-mini-2024-07-18 --out eval_report.json
If you have a simple text file of seed entities, one per line:
llm2graph extract-seeds --dataset seeds.txt
llm2graph run-benchmark --benchmark rwku --dataset seeds.txt --out-dir demo_run --limit 5
The generic run-benchmark command is the easiest public entry point. The benchmark-oriented aliases run-rwku and run-tofu are also available, and all three commands work with plain seed lists so general users do not need a benchmark-specific file format.
Benchmark Workflows
Build a graph with API-driven elicitation:
llm2graph entity \
--seed "Stephen King" \
--max-depth 2 \
--elicitation-question-count 10 \
--provider openai \
--model gpt-4o-mini-2024-07-18 \
--use-relevance \
--relevance-threshold 3.0 \
--out graph.json
Generate forget probes for every hop from 1..N plus retention probes:
llm2graph gen-queries \
--graph graph.json \
--target "Stephen King" \
--hops 3 \
--num-paths 100 \
--per-kind-limit 100 \
--aliases 3 \
--paraphrases 2 \
--distractors 2 \
--random-seed 13 \
--provider openai \
--model gpt-4o-mini-2024-07-18 \
--out queries.json
Run a full RWKU-style sweep from a local dataset file:
llm2graph run-benchmark \
--benchmark rwku \
--dataset rwku_entities.jsonl \
--out-dir rwku_run \
--hops 3 \
--num-paths 100 \
--per-kind-limit 100
Equivalent benchmark-specific alias:
llm2graph run-rwku \
--dataset rwku_entities.jsonl \
--out-dir rwku_run \
--hops 3 \
--num-paths 100 \
--per-kind-limit 100
Run a full TOFU-style sweep from a local dataset file:
llm2graph run-benchmark \
--benchmark tofu \
--dataset tofu_authors.json \
--out-dir tofu_run \
--hops 3 \
--num-paths 100 \
--per-kind-limit 100
Equivalent benchmark-specific alias:
llm2graph run-tofu \
--dataset tofu_authors.json \
--out-dir tofu_run \
--hops 3 \
--num-paths 100 \
--per-kind-limit 100
Each benchmark run writes per-entity artifacts plus a top-level benchmark_summary.json.
Public-Facing API
from llm2graph import BenchmarkRunner, BenchmarkRunnerConfig, GraphBuilder, QueryGenerator, Evaluator, load_seeds
seeds = load_seeds("seeds.txt")
runner = BenchmarkRunner(
BenchmarkRunnerConfig(
benchmark="rwku",
dataset_path="seeds.txt",
output_dir="demo_run",
provider="openai",
model="gpt-4o-mini-2024-07-18",
)
)
summary = runner.run()
Reproducibility Notes
llm2graph stores experiment metadata inside produced JSON files so runs are easier to audit.
- graph artifacts record construction settings and provider/model provenance
- query artifacts record path-sampling settings, source graph metadata, and retention probe structure
- evaluation artifacts record pre/post/judge model metadata, skipped pre-check counts, residual flags, and paper-style summary metrics
- benchmark runs record per-entity outputs and a summary manifest
To improve reproducibility across runs:
- keep prompt settings fixed
- set
random_seedduring query generation - persist the exact graph JSON used to create question sets
- keep provider and model versions in your experiment logs
- prefer a judge model for semantic equivalence when exact string match is too brittle
- reuse the graph per
(model, seed entity)pair when evaluating multiple unlearning methods
Paper Alignment
This package is intended to support the attached paper experiments and reproducibility workflow. The key package abstractions map onto the paper's methodology as follows:
GraphBuilder: API-driven entity-to-graph elicitation and controlled BFS expansion with decayQueryGenerator: single-hop, multi-hop, alias-based, 1-hop retention, 2-hop retention, and relationship-retention probe synthesisEvaluator: pre/post comparison, residual knowledge measurement, and paper-style aggregate metricsBenchmarkRunner: dataset-level orchestration for RWKU, TOFU, or public seed lists
Examples
See examples/reproducibility.md and examples/reproduce_pipeline.py for a minimal experiment template.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm2graph-0.3.4.tar.gz.
File metadata
- Download URL: llm2graph-0.3.4.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e464354d9ce162cb5c0ae411b7202ae0ce393a52804a904688e5ce11a9f0dbc9
|
|
| MD5 |
0007353607b67be91a5faa7c1aa71bb7
|
|
| BLAKE2b-256 |
3af3c1be68963aa2d488bc2cc1dc88963b442bfad5665b56c09f82f1fe319dab
|
File details
Details for the file llm2graph-0.3.4-py3-none-any.whl.
File metadata
- Download URL: llm2graph-0.3.4-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4f957923e5d6d0e964c1fd1678739ccfe7d3474e5bd5b31039e5013e79c0afa
|
|
| MD5 |
3cd4256b949a149ee08f30a3b5634b6a
|
|
| BLAKE2b-256 |
b290da3accc6d2818e6ca4be62b45d7fec087b12061a770588003be8e7cfcead
|