Skip to main content

Paper-aligned LLM-only graph construction, benchmark runners, and public-facing evaluation tools for LLM unlearning experiments.

Project description

LLM2Graph

llm2graph is a paper-aligned toolkit for reproducing the graph-based evaluation pipeline from The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning while still being usable for general public workflows.

It supports three layers of use:

  1. Entity-centric knowledge graph construction using LLM API calls throughout elicitation, triple extraction, relevance filtering, and alias resolution.
  2. Query generation from graph paths with controllable difficulty through hop depth, aliases, paraphrases, distractors, and retention probes.
  3. Benchmark and public workflows through simple CLI commands, dataset loaders, and reusable Python APIs.

Why 0.3.4

Version 0.3.4 extends the 0.3.3 paper-aligned pipeline with:

  • dataset-specific runners for RWKU and TOFU
  • public-friendly dataset loading from .txt, .json, .jsonl, and .csv
  • seed extraction utilities for previewing local benchmark files
  • reusable Python APIs for both benchmark replication and general entity-level experiments

Install

pip install llm2graph

Optional extras:

pip install "llm2graph[gemini]"
pip install "llm2graph[hf-local]"

Public Quickstart

If you just want to build a graph and evaluate one entity:

$env:OPENAI_API_KEY="..."
llm2graph entity --seed "Marie Curie" --out graph.json
llm2graph gen-queries --graph graph.json --target "Marie Curie" --hops 2 --out queries.json
llm2graph eval --queries queries.json --pre-model gpt-4o-mini-2024-07-18 --post-model gpt-4o-mini-2024-07-18 --out eval_report.json

If you have a simple text file of seed entities, one per line:

llm2graph extract-seeds --dataset seeds.txt
llm2graph run-benchmark --benchmark rwku --dataset seeds.txt --out-dir demo_run --limit 5

The generic run-benchmark command is the easiest public entry point. The benchmark-oriented aliases run-rwku and run-tofu are also available, and all three commands work with plain seed lists so general users do not need a benchmark-specific file format.

Benchmark Workflows

Build a graph with API-driven elicitation:

llm2graph entity \
  --seed "Stephen King" \
  --max-depth 2 \
  --elicitation-question-count 10 \
  --provider openai \
  --model gpt-4o-mini-2024-07-18 \
  --use-relevance \
  --relevance-threshold 3.0 \
  --out graph.json

Generate forget probes for every hop from 1..N plus retention probes:

llm2graph gen-queries \
  --graph graph.json \
  --target "Stephen King" \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100 \
  --aliases 3 \
  --paraphrases 2 \
  --distractors 2 \
  --random-seed 13 \
  --provider openai \
  --model gpt-4o-mini-2024-07-18 \
  --out queries.json

Run a full RWKU-style sweep from a local dataset file:

llm2graph run-benchmark \
  --benchmark rwku \
  --dataset rwku_entities.jsonl \
  --out-dir rwku_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Equivalent benchmark-specific alias:

llm2graph run-rwku \
  --dataset rwku_entities.jsonl \
  --out-dir rwku_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Run a full TOFU-style sweep from a local dataset file:

llm2graph run-benchmark \
  --benchmark tofu \
  --dataset tofu_authors.json \
  --out-dir tofu_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Equivalent benchmark-specific alias:

llm2graph run-tofu \
  --dataset tofu_authors.json \
  --out-dir tofu_run \
  --hops 3 \
  --num-paths 100 \
  --per-kind-limit 100

Each benchmark run writes per-entity artifacts plus a top-level benchmark_summary.json.

Public-Facing API

from llm2graph import BenchmarkRunner, BenchmarkRunnerConfig, GraphBuilder, QueryGenerator, Evaluator, load_seeds

seeds = load_seeds("seeds.txt")

runner = BenchmarkRunner(
    BenchmarkRunnerConfig(
        benchmark="rwku",
        dataset_path="seeds.txt",
        output_dir="demo_run",
        provider="openai",
        model="gpt-4o-mini-2024-07-18",
    )
)
summary = runner.run()

Reproducibility Notes

llm2graph stores experiment metadata inside produced JSON files so runs are easier to audit.

  • graph artifacts record construction settings and provider/model provenance
  • query artifacts record path-sampling settings, source graph metadata, and retention probe structure
  • evaluation artifacts record pre/post/judge model metadata, skipped pre-check counts, residual flags, and paper-style summary metrics
  • benchmark runs record per-entity outputs and a summary manifest

To improve reproducibility across runs:

  • keep prompt settings fixed
  • set random_seed during query generation
  • persist the exact graph JSON used to create question sets
  • keep provider and model versions in your experiment logs
  • prefer a judge model for semantic equivalence when exact string match is too brittle
  • reuse the graph per (model, seed entity) pair when evaluating multiple unlearning methods

Paper Alignment

This package is intended to support the attached paper experiments and reproducibility workflow. The key package abstractions map onto the paper's methodology as follows:

  • GraphBuilder: API-driven entity-to-graph elicitation and controlled BFS expansion with decay
  • QueryGenerator: single-hop, multi-hop, alias-based, 1-hop retention, 2-hop retention, and relationship-retention probe synthesis
  • Evaluator: pre/post comparison, residual knowledge measurement, and paper-style aggregate metrics
  • BenchmarkRunner: dataset-level orchestration for RWKU, TOFU, or public seed lists

Examples

See examples/reproducibility.md and examples/reproduce_pipeline.py for a minimal experiment template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm2graph-0.3.4.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm2graph-0.3.4-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file llm2graph-0.3.4.tar.gz.

File metadata

  • Download URL: llm2graph-0.3.4.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for llm2graph-0.3.4.tar.gz
Algorithm Hash digest
SHA256 e464354d9ce162cb5c0ae411b7202ae0ce393a52804a904688e5ce11a9f0dbc9
MD5 0007353607b67be91a5faa7c1aa71bb7
BLAKE2b-256 3af3c1be68963aa2d488bc2cc1dc88963b442bfad5665b56c09f82f1fe319dab

See more details on using hashes here.

File details

Details for the file llm2graph-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: llm2graph-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for llm2graph-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d4f957923e5d6d0e964c1fd1678739ccfe7d3474e5bd5b31039e5013e79c0afa
MD5 3cd4256b949a149ee08f30a3b5634b6a
BLAKE2b-256 b290da3accc6d2818e6ca4be62b45d7fec087b12061a770588003be8e7cfcead

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page