LLM2Graph: Dynamic Knowledge Graph Construction via LLM-only elicitation
Project description
LLM2Graph - Dynamic Knowledge Graph Construction & Evaluation
This package implements the graph-based methodology from the COLM 2025 paper:
The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
It provides an LLM-only pipeline for:
- Graph construction via entity-centric elicitation and triple extraction.
- Query generation with multi-hop, alias-perturbed, paraphrased questions, and optional distractors.
- Evaluation of pre vs post (unlearned) models, including a residual knowledge analysis.
If any step returns an unexpected format, the package raises LLMError.
Quick Start (End-to-End)
# 0) Install (choose providers you need)
pip install -e .
# Optionals:
pip install -e '.[gemini]' # Gemini support
pip install -e '.[hf-local]' # HuggingFace local LLMs
# 1) Build a graph from an entity
export OPENAI_API_KEY=sk-...
llm2graph entity --seed "Stephen King" --max-depth 2 \
--provider openai --model gpt-5-mini --out graph.json
# 2) Generate multi-hop queries with alias/paraphrase perturbations + distractors
llm2graph gen-queries --graph graph.json --target "Stephen King" \
--hops 2 --num-paths 50 --aliases 3 --paraphrases 2 --distractors 2 \
--provider openai --model gpt-5-mini --out queries.json
# 3) Evaluate pre vs post models (optionally use a judge model for equivalence)
llm2graph eval --queries queries.json \
--pre-provider openai --pre-model gpt-5-mini \
--post-provider openai --post-model gpt-5-mini \
--judge-provider openai --judge-model gpt-5-mini \
--out eval_report.json
The evaluation report includes accuracies by bucket (single/multi-hop, alias, paraphrase) and a residual_rate capturing when gold phrasing fails but a perturbation still succeeds.
Installation & Providers
Base
pip install -e .
OpenAI (default)
export OPENAI_API_KEY=sk-... # required for provider=openai
Gemini
pip install -e '.[gemini]'
export GEMINI_API_KEY=... # required for provider=gemini
Local HuggingFace
pip install -e '.[hf-local]'
# Ensure PyTorch is installed and you have a compatible GPU (recommended).
# Example model:
llm2graph entity --seed "Ada Lovelace" --provider hf-local \
--model mistralai/Mistral-7B-Instruct-v0.3 --max-depth 1 --out graph.json
All providers share the same strict prompting/validation; non-conforming outputs raise LLMError.
1) Graph Construction (Entity --> Graph)
Command
llm2graph entity \
--seed "Stephen King" \
--max-depth 2 \
--provider openai \
--model gpt-5-mini \
--out graph.json
What happens
- Elicitation: LLM writes a compact factual paragraph about the node.
- Triple extraction: LLM returns strictly formatted triples:
(subject ; relation ; object). - Strict checks: subject must equal the current node; malformed lines raise.
- Expansion (BFS): Adds objects as next-depth nodes.
Advanced (programmatic kwargs in GraphBuilder)
use_relevance: bool- LLM-scored 0-10; below threshold filtered.relevance_threshold: float- default 3.0.decay: float in [0.1, 1.0]- limits breadth as depth grows.max_nodes_per_depth: Optional[int]- hard cap per depth.alias_merge: bool- LLM-judged canonicalization of new nodes (YES/NO).
Output format (graph.json)
{
"seed": "Stephen King",
"nodes": ["Stephen King", "The Shining", "Maine", "..."],
"edges": [
{"subject": "Stephen King", "relation": "wrote", "object": "The Shining"},
{"subject": "Stephen King", "relation": "lives in", "object": "Maine"}
]
}
2) Query Generation (Multi-hop, Aliases, Paraphrases, Distractors)
Command
llm2graph gen-queries \
--graph graph.json \
--target "Stephen King" \
--hops 2 \
--num-paths 50 \
--aliases 3 \
--paraphrases 2 \
--distractors 2 \
--provider openai \
--model gpt-5-mini \
--out queries.json
What happens
- Samples
--hops-length paths from the graph. - Synthesizes a single question per path; the final node is the gold answer.
- Generates paraphrases and alias-perturbed variants.
- Optionally generates distractors.
Output (queries.json)
{
"meta": {"hops": 2, "num_paths": 50, "aliases": 3, "paraphrases": 2, "distractors": 2},
"queries": [{
"path": [{"s": "A", "r": "rel1", "o": "B"}, {"s": "B", "r": "rel2", "o": "C"}],
"q_gold": "Which work by the 'King of Horror' features ...?",
"q_variants": ["... paraphrase1", "... paraphrase2"],
"q_alias_variants": ["... alias-perturbed phrasing ..."],
"answer": "C",
"distractors": ["X","Y"]
}]
}
Difficulty control
- Hop length (
--hops) raises reasoning depth. - Distractors increase choice difficulty.
- Aliases/Paraphrases stress alias-robustness and surface-form robustness.
3) Evaluation (Pre vs Post, with Residual Knowledge)
Command
llm2graph eval \
--queries queries.json \
--pre-provider openai --pre-model gpt-5-mini \
--post-provider openai --post-model gpt-5-mini \
--judge-provider openai --judge-model gpt-5-mini \
--out eval_report.json
What happens
- Asks pre and post models the gold question.
- Asks the post model every variant (paraphrase/alias).
- If
judgeis provided, equivalence is decided by strict"YES"/"NO"judgments; otherwise exact string equality is used.
Residual Knowledge (paper-aligned)
- An item is marked residual if gold is incorrect post, but any alias/paraphrase variant is correct.
- Summarized via
residual_rateandresidual_count.
Output (eval_report.json)
{
"summary": {
"all": {"total": N, "correct": k, "accuracy": 0.xx},
"single_hop": {"total": ..., ...},
"multi_hop": {"total": ..., ...},
"alias": {"total": ..., ...},
"paraphrase": {"total": ..., ...},
"residual_rate": 0.xx,
"residual_count": M,
"num_items": N_items
},
"items": [
{
"path": [...],
"predictions": [
{"variant": "gold", "type": "gold", "pre": "…", "post": "…", "correct": true/false},
{"variant": "paraphrase", "type": "paraphrase", "pre": null, "post": "…", "correct": ...},
{"variant": "alias", "type": "alias", "pre": null, "post": "…", "correct": ...}
],
"residual_flags": {
"residual": true/false,
"gold_correct": false,
"alias_any": true/false,
"para_any": true/false
}
}
]
}
Implementation Notes
- Strict parsing: Triple lines must be exactly
(subject ; relation ; object); subject must equal the current node. - Alias canonicalization: Node merging uses
canonical_same(a,b)--> strict"YES"/"NO"from an LLM. - Relevance scoring: 0-10 numeric, LLM-only; thresholded filtering (optional).
- HF local chat templates: If available, we use
.apply_chat_template; else a minimal structured prompt is used. - No heuristic fallbacks: Any format drift raises
LLMError.
Troubleshooting
- LLMError: The model did not follow the strict format. Retry with a different model or lower temperature.
- Model access: Ensure
OPENAI_API_KEY/GEMINI_API_KEYis set; confirm the--modelexists for that provider. - HF OOM: Choose a smaller HF repo; reduce generation tokens; consider 4/8-bit loading (extend loader as needed).
Citation
If you use this package, please cite:
Shah, Raj Sanjay, Jing Huang, Keerthiram Murugesan, Nathalie Baracaldo, and Diyi Yang. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning. Second Conference on Language Modeling. 2025.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm2graph-0.3.2.tar.gz.
File metadata
- Download URL: llm2graph-0.3.2.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18e10dbbaa290e9a0a15c84de2fc7e76c6872eae8fb5803ce0111977a03f0816
|
|
| MD5 |
a793840a5e073fc49c41e0bbe48ebbb4
|
|
| BLAKE2b-256 |
ba72ef4ec4734b4543b382fe86be09f38d12f50d9214ce208da4763a5bd63712
|
File details
Details for the file llm2graph-0.3.2-py3-none-any.whl.
File metadata
- Download URL: llm2graph-0.3.2-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4af0bdb3ca5c131d95d8c66d4e92c5cf97e4d7f44673f3998b60de159bc9af9
|
|
| MD5 |
5846f160a3813e9ac400e031657ec990
|
|
| BLAKE2b-256 |
6a9e08118c988e572d0a2e9da2381918e742b3d16da70b389806f9720a370619
|