Skip to main content

LLM2Graph: Dynamic Knowledge Graph Construction via LLM-only elicitation

Project description

LLM2Graph - Dynamic Knowledge Graph Construction & Evaluation

This package implements the graph-based methodology from the COLM 2025 paper:

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

It provides an LLM-only pipeline for:

  1. Graph construction via entity-centric elicitation and triple extraction.
  2. Query generation with multi-hop, alias-perturbed, paraphrased questions, and optional distractors.
  3. Evaluation of pre vs post (unlearned) models, including a residual knowledge analysis.

If any step returns an unexpected format, the package raises LLMError.


Quick Start (End-to-End)

# 0) Install (choose providers you need)
pip install -e .
# Optionals:
pip install -e '.[gemini]'      # Gemini support
pip install -e '.[hf-local]'    # HuggingFace local LLMs

# 1) Build a graph from an entity
export OPENAI_API_KEY=sk-...
llm2graph entity --seed "Stephen King" --max-depth 2 \
  --provider openai --model gpt-5-mini --out graph.json

# 2) Generate multi-hop queries with alias/paraphrase perturbations + distractors
llm2graph gen-queries --graph graph.json --target "Stephen King" \
  --hops 2 --num-paths 50 --aliases 3 --paraphrases 2 --distractors 2 \
  --provider openai --model gpt-5-mini --out queries.json

# 3) Evaluate pre vs post models (optionally use a judge model for equivalence)
llm2graph eval --queries queries.json \
  --pre-provider openai  --pre-model gpt-5-mini \
  --post-provider openai --post-model gpt-5-mini \
  --judge-provider openai --judge-model gpt-5-mini \
  --out eval_report.json

The evaluation report includes accuracies by bucket (single/multi-hop, alias, paraphrase) and a residual_rate capturing when gold phrasing fails but a perturbation still succeeds.


Installation & Providers

Base

pip install -e .

OpenAI (default)

export OPENAI_API_KEY=sk-...      # required for provider=openai

Gemini

pip install -e '.[gemini]'
export GEMINI_API_KEY=...         # required for provider=gemini

Local HuggingFace

pip install -e '.[hf-local]'
# Ensure PyTorch is installed and you have a compatible GPU (recommended).
# Example model:
llm2graph entity --seed "Ada Lovelace" --provider hf-local \
  --model mistralai/Mistral-7B-Instruct-v0.3 --max-depth 1 --out graph.json

All providers share the same strict prompting/validation; non-conforming outputs raise LLMError.


1) Graph Construction (Entity --> Graph)

Command

llm2graph entity \
  --seed "Stephen King" \
  --max-depth 2 \
  --provider openai \
  --model gpt-5-mini \
  --out graph.json

What happens

  • Elicitation: LLM writes a compact factual paragraph about the node.
  • Triple extraction: LLM returns strictly formatted triples: (subject ; relation ; object).
  • Strict checks: subject must equal the current node; malformed lines raise.
  • Expansion (BFS): Adds objects as next-depth nodes.

Advanced (programmatic kwargs in GraphBuilder)

  • use_relevance: bool - LLM-scored 0-10; below threshold filtered.
  • relevance_threshold: float - default 3.0.
  • decay: float in [0.1, 1.0] - limits breadth as depth grows.
  • max_nodes_per_depth: Optional[int] - hard cap per depth.
  • alias_merge: bool - LLM-judged canonicalization of new nodes (YES/NO).

Output format (graph.json)

{
  "seed": "Stephen King",
  "nodes": ["Stephen King", "The Shining", "Maine", "..."],
  "edges": [
    {"subject": "Stephen King", "relation": "wrote", "object": "The Shining"},
    {"subject": "Stephen King", "relation": "lives in", "object": "Maine"}
  ]
}

2) Query Generation (Multi-hop, Aliases, Paraphrases, Distractors)

Command

llm2graph gen-queries \
  --graph graph.json \
  --target "Stephen King" \
  --hops 2 \
  --num-paths 50 \
  --aliases 3 \
  --paraphrases 2 \
  --distractors 2 \
  --provider openai \
  --model gpt-5-mini \
  --out queries.json

What happens

  • Samples --hops-length paths from the graph.
  • Synthesizes a single question per path; the final node is the gold answer.
  • Generates paraphrases and alias-perturbed variants.
  • Optionally generates distractors.

Output (queries.json)

{
  "meta": {"hops": 2, "num_paths": 50, "aliases": 3, "paraphrases": 2, "distractors": 2},
  "queries": [{
    "path": [{"s": "A", "r": "rel1", "o": "B"}, {"s": "B", "r": "rel2", "o": "C"}],
    "q_gold": "Which work by the 'King of Horror' features ...?",
    "q_variants": ["... paraphrase1", "... paraphrase2"],
    "q_alias_variants": ["... alias-perturbed phrasing ..."],
    "answer": "C",
    "distractors": ["X","Y"]
  }]
}

Difficulty control

  • Hop length (--hops) raises reasoning depth.
  • Distractors increase choice difficulty.
  • Aliases/Paraphrases stress alias-robustness and surface-form robustness.

3) Evaluation (Pre vs Post, with Residual Knowledge)

Command

llm2graph eval \
  --queries queries.json \
  --pre-provider openai  --pre-model gpt-5-mini \
  --post-provider openai --post-model gpt-5-mini \
  --judge-provider openai --judge-model gpt-5-mini \
  --out eval_report.json

What happens

  • Asks pre and post models the gold question.
  • Asks the post model every variant (paraphrase/alias).
  • If judge is provided, equivalence is decided by strict "YES"/"NO" judgments; otherwise exact string equality is used.

Residual Knowledge (paper-aligned)

  • An item is marked residual if gold is incorrect post, but any alias/paraphrase variant is correct.
  • Summarized via residual_rate and residual_count.

Output (eval_report.json)

{
  "summary": {
    "all":         {"total": N, "correct": k, "accuracy": 0.xx},
    "single_hop":  {"total": ..., ...},
    "multi_hop":   {"total": ..., ...},
    "alias":       {"total": ..., ...},
    "paraphrase":  {"total": ..., ...},
    "residual_rate": 0.xx,
    "residual_count": M,
    "num_items": N_items
  },
  "items": [
    {
      "path": [...],
      "predictions": [
        {"variant": "gold", "type": "gold", "pre": "…", "post": "…", "correct": true/false},
        {"variant": "paraphrase", "type": "paraphrase", "pre": null, "post": "…", "correct": ...},
        {"variant": "alias", "type": "alias", "pre": null, "post": "…", "correct": ...}
      ],
      "residual_flags": {
        "residual": true/false,
        "gold_correct": false,
        "alias_any": true/false,
        "para_any": true/false
      }
    }
  ]
}

Implementation Notes

  • Strict parsing: Triple lines must be exactly (subject ; relation ; object); subject must equal the current node.
  • Alias canonicalization: Node merging uses canonical_same(a,b) --> strict "YES"/"NO" from an LLM.
  • Relevance scoring: 0-10 numeric, LLM-only; thresholded filtering (optional).
  • HF local chat templates: If available, we use .apply_chat_template; else a minimal structured prompt is used.
  • No heuristic fallbacks: Any format drift raises LLMError.

Troubleshooting

  • LLMError: The model did not follow the strict format. Retry with a different model or lower temperature.
  • Model access: Ensure OPENAI_API_KEY/GEMINI_API_KEY is set; confirm the --model exists for that provider.
  • HF OOM: Choose a smaller HF repo; reduce generation tokens; consider 4/8-bit loading (extend loader as needed).

Citation

If you use this package, please cite:

Shah, Raj Sanjay, Jing Huang, Keerthiram Murugesan, Nathalie Baracaldo, and Diyi Yang. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning. Second Conference on Language Modeling. 2025.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm2graph-0.3.2.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm2graph-0.3.2-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file llm2graph-0.3.2.tar.gz.

File metadata

  • Download URL: llm2graph-0.3.2.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for llm2graph-0.3.2.tar.gz
Algorithm Hash digest
SHA256 18e10dbbaa290e9a0a15c84de2fc7e76c6872eae8fb5803ce0111977a03f0816
MD5 a793840a5e073fc49c41e0bbe48ebbb4
BLAKE2b-256 ba72ef4ec4734b4543b382fe86be09f38d12f50d9214ce208da4763a5bd63712

See more details on using hashes here.

File details

Details for the file llm2graph-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: llm2graph-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for llm2graph-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d4af0bdb3ca5c131d95d8c66d4e92c5cf97e4d7f44673f3998b60de159bc9af9
MD5 5846f160a3813e9ac400e031657ec990
BLAKE2b-256 6a9e08118c988e572d0a2e9da2381918e742b3d16da70b389806f9720a370619

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page