Paper-aligned LLM-only graph construction, benchmark runners, and public-facing evaluation tools for LLM unlearning experiments.

These details have not been verified by PyPI

Project description

LLM2Graph

llm2graph is a paper-aligned toolkit for reproducing the graph-based evaluation pipeline from The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning while still being usable for general public workflows.

It supports three layers of use:

Entity-centric knowledge graph construction using LLM API calls throughout elicitation, triple extraction, relevance filtering, and alias resolution.
Query generation from graph paths with controllable difficulty through hop depth, aliases, paraphrases, distractors, and retention probes.
Benchmark and public workflows through simple CLI commands, dataset loaders, and reusable Python APIs.

Why 0.3.4

Version 0.3.4 extends the 0.3.3 paper-aligned pipeline with:

dataset-specific runners for RWKU and TOFU
public-friendly dataset loading from .txt, .json, .jsonl, and .csv
seed extraction utilities for previewing local benchmark files
reusable Python APIs for both benchmark replication and general entity-level experiments

Install

pip install llm2graph

Optional extras:

pip install "llm2graph[gemini]"
pip install "llm2graph[hf-local]"

Public Quickstart

If you just want to build a graph and evaluate one entity:

$env:OPENAI_API_KEY="..."
llm2graph entity --seed "Marie Curie" --out graph.json
llm2graph gen-queries --graph graph.json --target "Marie Curie" --hops 2 --out queries.json
llm2graph eval --queries queries.json --pre-model gpt-4o-mini-2024-07-18 --post-model gpt-4o-mini-2024-07-18 --out eval_report.json

If you have a simple text file of seed entities, one per line:

llm2graph extract-seeds --dataset seeds.txt
llm2graph run-benchmark --benchmark rwku --dataset seeds.txt --out-dir demo_run --limit 5

The generic run-benchmark command is the easiest public entry point. The benchmark-oriented aliases run-rwku and run-tofu are also available, and all three commands work with plain seed lists so general users do not need a benchmark-specific file format.

Benchmark Workflows

Build a graph with API-driven elicitation:

llm2graph entity \
  --seed "Stephen King" \
  --max-depth 2 \
  --elicitation-question-count 10 \
  --provider openai \
  --model gpt-4o-mini-2024-07-18 \
  --use-relevance \
  --relevance-threshold 3.0 \
  --out graph.json

Generate forget probes for every hop from 1..N plus retention probes:

llm2graph gen-queries \
  --graph graph.json \
  --target "Stephen King" \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100 \
  --aliases 3 \
  --paraphrases 2 \
  --distractors 2 \
  --random-seed 13 \
  --provider openai \
  --model gpt-4o-mini-2024-07-18 \
  --out queries.json

Run a full RWKU-style sweep from a local dataset file:

llm2graph run-benchmark \
  --benchmark rwku \
  --dataset rwku_entities.jsonl \
  --out-dir rwku_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Equivalent benchmark-specific alias:

llm2graph run-rwku \
  --dataset rwku_entities.jsonl \
  --out-dir rwku_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Run a full TOFU-style sweep from a local dataset file:

llm2graph run-benchmark \
  --benchmark tofu \
  --dataset tofu_authors.json \
  --out-dir tofu_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Equivalent benchmark-specific alias:

llm2graph run-tofu \
  --dataset tofu_authors.json \
  --out-dir tofu_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Each benchmark run writes per-entity artifacts plus a top-level benchmark_summary.json.

Public-Facing API

from llm2graph import BenchmarkRunner, BenchmarkRunnerConfig, GraphBuilder, QueryGenerator, Evaluator, load_seeds

seeds = load_seeds("seeds.txt")

runner = BenchmarkRunner(
    BenchmarkRunnerConfig(
        benchmark="rwku",
        dataset_path="seeds.txt",
        output_dir="demo_run",
        provider="openai",
        model="gpt-4o-mini-2024-07-18",
    )
)
summary = runner.run()

Reproducibility Notes

llm2graph stores experiment metadata inside produced JSON files so runs are easier to audit.

graph artifacts record construction settings and provider/model provenance
query artifacts record path-sampling settings, source graph metadata, and retention probe structure
evaluation artifacts record pre/post/judge model metadata, skipped pre-check counts, residual flags, and paper-style summary metrics
benchmark runs record per-entity outputs and a summary manifest

To improve reproducibility across runs:

keep prompt settings fixed
set random_seed during query generation
persist the exact graph JSON used to create question sets
keep provider and model versions in your experiment logs
prefer a judge model for semantic equivalence when exact string match is too brittle
reuse the graph per (model, seed entity) pair when evaluating multiple unlearning methods

Paper Alignment

This package is intended to support the attached paper experiments and reproducibility workflow. The key package abstractions map onto the paper's methodology as follows:

GraphBuilder: API-driven entity-to-graph elicitation and controlled BFS expansion with decay
QueryGenerator: single-hop, multi-hop, alias-based, 1-hop retention, 2-hop retention, and relationship-retention probe synthesis
Evaluator: pre/post comparison, residual knowledge measurement, and paper-style aggregate metrics
BenchmarkRunner: dataset-level orchestration for RWKU, TOFU, or public seed lists

Examples

See examples/reproducibility.md and examples/reproduce_pipeline.py for a minimal experiment template.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.5

Mar 13, 2026

This version

0.3.4

Mar 9, 2026

0.3.2

Oct 6, 2025

0.3.0

Oct 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm2graph-0.3.4.tar.gz (26.3 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm2graph-0.3.4-py3-none-any.whl (25.0 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file llm2graph-0.3.4.tar.gz.

File metadata

Download URL: llm2graph-0.3.4.tar.gz
Upload date: Mar 9, 2026
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for llm2graph-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`e464354d9ce162cb5c0ae411b7202ae0ce393a52804a904688e5ce11a9f0dbc9`
MD5	`0007353607b67be91a5faa7c1aa71bb7`
BLAKE2b-256	`3af3c1be68963aa2d488bc2cc1dc88963b442bfad5665b56c09f82f1fe319dab`

See more details on using hashes here.

File details

Details for the file llm2graph-0.3.4-py3-none-any.whl.

File metadata

Download URL: llm2graph-0.3.4-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 25.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for llm2graph-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4f957923e5d6d0e964c1fd1678739ccfe7d3474e5bd5b31039e5013e79c0afa`
MD5	`3cd4256b949a149ee08f30a3b5634b6a`
BLAKE2b-256	`b290da3accc6d2818e6ca4be62b45d7fec087b12061a770588003be8e7cfcead`

See more details on using hashes here.

llm2graph 0.3.4

Navigation

Verified details

Project links

Maintainers

Unverified details

Meta

Classifiers

Project description

LLM2Graph

Why 0.3.4

Install

Public Quickstart

Benchmark Workflows

Public-Facing API

Reproducibility Notes

Paper Alignment

Examples

Project details

Verified details

Project links

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes