LLM2Graph: Dynamic Knowledge Graph Construction via LLM-only elicitation

Project description

LLM2Graph - Dynamic Knowledge Graph Construction & Evaluation

This package implements the graph-based methodology from the COLM 2025 paper:

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

It provides an LLM-only pipeline for:

Graph construction via entity-centric elicitation and triple extraction.
Query generation with multi-hop, alias-perturbed, paraphrased questions, and optional distractors.
Evaluation of pre vs post (unlearned) models, including a residual knowledge analysis.

If any step returns an unexpected format, the package raises LLMError.

Quick Start (End-to-End)

# 0) Install (choose providers you need)
pip install -e .
# Optionals:
pip install -e '.[gemini]'      # Gemini support
pip install -e '.[hf-local]'    # HuggingFace local LLMs

# 1) Build a graph from an entity
export OPENAI_API_KEY=sk-...
llm2graph entity --seed "Stephen King" --max-depth 2 \
  --provider openai --model gpt-5-mini --out graph.json

# 2) Generate multi-hop queries with alias/paraphrase perturbations + distractors
llm2graph gen-queries --graph graph.json --target "Stephen King" \
  --hops 2 --num-paths 50 --aliases 3 --paraphrases 2 --distractors 2 \
  --provider openai --model gpt-5-mini --out queries.json

# 3) Evaluate pre vs post models (optionally use a judge model for equivalence)
llm2graph eval --queries queries.json \
  --pre-provider openai  --pre-model gpt-5-mini \
  --post-provider openai --post-model gpt-5-mini \
  --judge-provider openai --judge-model gpt-5-mini \
  --out eval_report.json

The evaluation report includes accuracies by bucket (single/multi-hop, alias, paraphrase) and a residual_rate capturing when gold phrasing fails but a perturbation still succeeds.

Installation & Providers

Base

pip install -e .

OpenAI (default)

export OPENAI_API_KEY=sk-...      # required for provider=openai

Gemini

pip install -e '.[gemini]'
export GEMINI_API_KEY=...         # required for provider=gemini

Local HuggingFace

pip install -e '.[hf-local]'
# Ensure PyTorch is installed and you have a compatible GPU (recommended).
# Example model:
llm2graph entity --seed "Ada Lovelace" --provider hf-local \
  --model mistralai/Mistral-7B-Instruct-v0.3 --max-depth 1 --out graph.json

All providers share the same strict prompting/validation; non-conforming outputs raise LLMError.

1) Graph Construction (Entity --> Graph)

Command

llm2graph entity \
  --seed "Stephen King" \
  --max-depth 2 \
  --provider openai \
  --model gpt-5-mini \
  --out graph.json

What happens

Elicitation: LLM writes a compact factual paragraph about the node.
Triple extraction: LLM returns strictly formatted triples: (subject ; relation ; object).
Strict checks: subject must equal the current node; malformed lines raise.
Expansion (BFS): Adds objects as next-depth nodes.

Advanced (programmatic kwargs in GraphBuilder)

use_relevance: bool - LLM-scored 0-10; below threshold filtered.
relevance_threshold: float - default 3.0.
decay: float in [0.1, 1.0] - limits breadth as depth grows.
max_nodes_per_depth: Optional[int] - hard cap per depth.
alias_merge: bool - LLM-judged canonicalization of new nodes (YES/NO).

Output format (graph.json)

{
  "seed": "Stephen King",
  "nodes": ["Stephen King", "The Shining", "Maine", "..."],
  "edges": [
    {"subject": "Stephen King", "relation": "wrote", "object": "The Shining"},
    {"subject": "Stephen King", "relation": "lives in", "object": "Maine"}
  ]
}

2) Query Generation (Multi-hop, Aliases, Paraphrases, Distractors)

Command

llm2graph gen-queries \
  --graph graph.json \
  --target "Stephen King" \
  --hops 2 \
  --num-paths 50 \
  --aliases 3 \
  --paraphrases 2 \
  --distractors 2 \
  --provider openai \
  --model gpt-5-mini \
  --out queries.json

What happens

Samples --hops-length paths from the graph.
Synthesizes a single question per path; the final node is the gold answer.
Generates paraphrases and alias-perturbed variants.
Optionally generates distractors.

Output (queries.json)

{
  "meta": {"hops": 2, "num_paths": 50, "aliases": 3, "paraphrases": 2, "distractors": 2},
  "queries": [{
    "path": [{"s": "A", "r": "rel1", "o": "B"}, {"s": "B", "r": "rel2", "o": "C"}],
    "q_gold": "Which work by the 'King of Horror' features ...?",
    "q_variants": ["... paraphrase1", "... paraphrase2"],
    "q_alias_variants": ["... alias-perturbed phrasing ..."],
    "answer": "C",
    "distractors": ["X","Y"]
  }]
}

Difficulty control

Hop length (--hops) raises reasoning depth.
Distractors increase choice difficulty.
Aliases/Paraphrases stress alias-robustness and surface-form robustness.

3) Evaluation (Pre vs Post, with Residual Knowledge)

Command

llm2graph eval \
  --queries queries.json \
  --pre-provider openai  --pre-model gpt-5-mini \
  --post-provider openai --post-model gpt-5-mini \
  --judge-provider openai --judge-model gpt-5-mini \
  --out eval_report.json

What happens

Asks pre and post models the gold question.
Asks the post model every variant (paraphrase/alias).
If judge is provided, equivalence is decided by strict "YES"/"NO" judgments; otherwise exact string equality is used.

Residual Knowledge (paper-aligned)

An item is marked residual if gold is incorrect post, but any alias/paraphrase variant is correct.
Summarized via residual_rate and residual_count.

Output (eval_report.json)

{
  "summary": {
    "all":         {"total": N, "correct": k, "accuracy": 0.xx},
    "single_hop":  {"total": ..., ...},
    "multi_hop":   {"total": ..., ...},
    "alias":       {"total": ..., ...},
    "paraphrase":  {"total": ..., ...},
    "residual_rate": 0.xx,
    "residual_count": M,
    "num_items": N_items
  },
  "items": [
    {
      "path": [...],
      "predictions": [
        {"variant": "gold", "type": "gold", "pre": "…", "post": "…", "correct": true/false},
        {"variant": "paraphrase", "type": "paraphrase", "pre": null, "post": "…", "correct": ...},
        {"variant": "alias", "type": "alias", "pre": null, "post": "…", "correct": ...}
      ],
      "residual_flags": {
        "residual": true/false,
        "gold_correct": false,
        "alias_any": true/false,
        "para_any": true/false
      }
    }
  ]
}

Implementation Notes

Strict parsing: Triple lines must be exactly (subject ; relation ; object); subject must equal the current node.
Alias canonicalization: Node merging uses canonical_same(a,b) --> strict "YES"/"NO" from an LLM.
Relevance scoring: 0-10 numeric, LLM-only; thresholded filtering (optional).
HF local chat templates: If available, we use .apply_chat_template; else a minimal structured prompt is used.
No heuristic fallbacks: Any format drift raises LLMError.

Troubleshooting

LLMError: The model did not follow the strict format. Retry with a different model or lower temperature.
Model access: Ensure OPENAI_API_KEY/GEMINI_API_KEY is set; confirm the --model exists for that provider.
HF OOM: Choose a smaller HF repo; reduce generation tokens; consider 4/8-bit loading (extend loader as needed).

Citation

If you use this package, please cite:

Shah, Raj Sanjay, Jing Huang, Keerthiram Murugesan, Nathalie Baracaldo, and Diyi Yang. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning. Second Conference on Language Modeling. 2025.

Project details

Release history Release notifications | RSS feed

0.3.5

Mar 13, 2026

0.3.4

Mar 9, 2026

This version

0.3.2

Oct 6, 2025

0.3.0

Oct 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm2graph-0.3.2.tar.gz (15.5 kB view details)

Uploaded Oct 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm2graph-0.3.2-py3-none-any.whl (15.2 kB view details)

Uploaded Oct 6, 2025 Python 3

File details

Details for the file llm2graph-0.3.2.tar.gz.

File metadata

Download URL: llm2graph-0.3.2.tar.gz
Upload date: Oct 6, 2025
Size: 15.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for llm2graph-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`18e10dbbaa290e9a0a15c84de2fc7e76c6872eae8fb5803ce0111977a03f0816`
MD5	`a793840a5e073fc49c41e0bbe48ebbb4`
BLAKE2b-256	`ba72ef4ec4734b4543b382fe86be09f38d12f50d9214ce208da4763a5bd63712`

See more details on using hashes here.

File details

Details for the file llm2graph-0.3.2-py3-none-any.whl.

File metadata

Download URL: llm2graph-0.3.2-py3-none-any.whl
Upload date: Oct 6, 2025
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for llm2graph-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4af0bdb3ca5c131d95d8c66d4e92c5cf97e4d7f44673f3998b60de159bc9af9`
MD5	`5846f160a3813e9ac400e031657ec990`
BLAKE2b-256	`6a9e08118c988e572d0a2e9da2381918e742b3d16da70b389806f9720a370619`

See more details on using hashes here.

llm2graph 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLM2Graph - Dynamic Knowledge Graph Construction & Evaluation

Quick Start (End-to-End)

Installation & Providers

Base

OpenAI (default)

Gemini

Local HuggingFace

1) Graph Construction (Entity --> Graph)

2) Query Generation (Multi-hop, Aliases, Paraphrases, Distractors)

3) Evaluation (Pre vs Post, with Residual Knowledge)

Implementation Notes

Troubleshooting

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes