General-purpose knowledge graph extraction framework
Project description
k-extract
Extract knowledge graphs from any codebase or documentation. Point it at your repos, describe what you're trying to understand, and get a graph out.
Quick Start
1. Define what to extract
uvx k-extract init ./my-repo ./another-repo
This walks you through:
- Describing your problem ("I need to understand my testing inventory and coverage gaps")
- Reviewing a proposed ontology (entity types, relationship types)
- Refining until you're satisfied
Produces extraction.yaml — your complete extraction config.
2. Run the extraction
uvx k-extract run --config extraction.yaml
Outputs graph.jsonl. Ctrl-C anytime — re-run to resume where you left off.
3. Load into kartograph
The output is kartograph-compatible JSONL. Feed it to kartograph's mutation endpoint to query your graph.
Requirements
- uv (or Python 3.12+ with
pip install k-extract) - An Anthropic API key (or Vertex AI credentials) — set via environment variables
- Model configured via environment (e.g.,
ANTHROPIC_MODEL=claude-sonnet-4-6)
Configuration
extraction.yaml is human-readable and fully editable. It contains:
- problem_statement — what you're trying to understand
- data_sources — paths to your repos/data
- ontology — entity and relationship types to extract
- prompts — the exact instructions agents receive (generated, but editable)
- output — where results go (
graph.jsonl,extraction.db)
Edit any field, re-run. Changing the config invalidates previous results — use --force to start fresh.
CLI Reference
uvx k-extract init <path> [<path> ...] # Interactive ontology design
uvx k-extract run --config <yaml> # Run extraction (resumes by default)
uvx k-extract run --config <yaml> --force # Discard previous results, start fresh
uvx k-extract jobs --config <yaml> # Inspect job state
uvx k-extract jobs --config <yaml> --status failed # See failed jobs
How It Works
initscans your data, proposes an ontology based on your problem statement, and generates agent promptsrunbatches files into jobs sized to the model's context window, then launches parallel agents- Each agent reads source files, extracts entities/relationships via tool calls, and commits to a shared store
- Results stream to
graph.jsonlas jobs complete
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file k_extract-0.6.1.tar.gz.
File metadata
- Download URL: k_extract-0.6.1.tar.gz
- Upload date:
- Size: 292.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e07843d4fdce6dba37c687ec16f66dfa15a8f432b161b38ba0269a5034c2b4d
|
|
| MD5 |
e895b03371d371b0e6f6f7d0e6005321
|
|
| BLAKE2b-256 |
ee033cb73966e8bca977aea88f38fcfc145df38bdeecd05192d31d5eece2915f
|
Provenance
The following attestation bundles were made for k_extract-0.6.1.tar.gz:
Publisher:
release.yml on jsell-rh/k-extract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
k_extract-0.6.1.tar.gz -
Subject digest:
2e07843d4fdce6dba37c687ec16f66dfa15a8f432b161b38ba0269a5034c2b4d - Sigstore transparency entry: 1273101315
- Sigstore integration time:
-
Permalink:
jsell-rh/k-extract@f4ac7f1bf47e6fdf4a9cbeb8b7ab9c92ea4742ef -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jsell-rh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f4ac7f1bf47e6fdf4a9cbeb8b7ab9c92ea4742ef -
Trigger Event:
push
-
Statement type:
File details
Details for the file k_extract-0.6.1-py3-none-any.whl.
File metadata
- Download URL: k_extract-0.6.1-py3-none-any.whl
- Upload date:
- Size: 71.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c23d30a2d4ca0636a5d0d290acec9fb889dcf0a977e22f107b030dee2d9d5330
|
|
| MD5 |
c383a27bb723ae1612a801370cc64e6f
|
|
| BLAKE2b-256 |
a823ed5cc57bbecb4fa7cd638f270409f75f2daaa403c6d9e5440d07d74b14df
|
Provenance
The following attestation bundles were made for k_extract-0.6.1-py3-none-any.whl:
Publisher:
release.yml on jsell-rh/k-extract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
k_extract-0.6.1-py3-none-any.whl -
Subject digest:
c23d30a2d4ca0636a5d0d290acec9fb889dcf0a977e22f107b030dee2d9d5330 - Sigstore transparency entry: 1273101393
- Sigstore integration time:
-
Permalink:
jsell-rh/k-extract@f4ac7f1bf47e6fdf4a9cbeb8b7ab9c92ea4742ef -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jsell-rh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f4ac7f1bf47e6fdf4a9cbeb8b7ab9c92ea4742ef -
Trigger Event:
push
-
Statement type: