An intelligent, LLM-powered knowledge extraction and evolution framework with semantic search capabilities
Project description
Smart Knowledge Extraction CLI
Transform documents into structured knowledge with one command.
"Stop reading. Start understanding."
"告别文档焦虑,让信息一目了然"
Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.
✨ Core Features
- 🔷 8 Auto-Types: From basic
AutoModel/AutoListto advancedAutoGraph,AutoHypergraph, andAutoSpatioTemporalGraph. - 🧠 10+ Extraction Engines: Out-of-the-box support for cutting-edge retrieval paradigms like
GraphRAG,LightRAG,Hyper-RAG, andKG-Gen. - 📝 Declarative YAML Templates: Zero-code extraction definition. Includes 80+ presets across 6 domains.
- 🔄 Incremental Evolution: Feed new documents on the fly to continuously map out and expand the extracted knowledge.
⚡ Quick Start
1. Installation
For CLI Users (install he command globally):
uv tool install hyperextract
For Python Developers (use as library):
uv pip install hyperextract
2. The Command Line Way
Extract, search, and manage directly from CLI.
By default, the CLI uses
gpt-4o-miniandtext-embedding-3-small.
# Configure OpenAI API Key
he config init -k YOUR_OPENAI_API_KEY
# Extract knowledge
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en
# Query the knowledge abstract
he search ./output/ "What are Tesla's major achievements?"
# Visualize the knowledge graph
he show ./output/
# Incrementally supplement knowledge
he feed ./output/ examples/en/tesla_question.md
# Show the updated knowledge graph
he show ./output/
🐍 The Python API Way (click to expand)
Installation
# Clone the repository
git clone https://github.com/yifanfeng97/hyper-extract.git
cd hyper-extract
# Install dependencies
uv sync
Configuration
# Copy the example env file
cp .env.example .env
# Edit .env with your API key and base URL
# OPENAI_API_KEY=your-api-key
# OPENAI_BASE_URL=https://api.openai.com/v1
Usage
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
from hyperextract import Template
# Create a template
ka = Template.create("general/biography_graph")
# Parse a document
with open("examples/en/tesla.md", "r", encoding="utf-8") as f:
text = f.read()
result = ka.parse(text)
# Visualize the knowledge graph
ka.show(result)
# Incrementally supplement knowledge
with open("examples/en/tesla_question.md", "r", encoding="utf-8") as f:
new_text = f.read()
ka.feed(result, new_text)
# Show the updated knowledge graph
ka.show(result)
🔗 For complete examples, see examples/en
Installation Comparison:
| Use Case | Command | Purpose |
|---|---|---|
| CLI Tool | uv tool install hyperextract |
Install he command globally |
| Python Library | uv pip install hyperextract |
Use in Python code |
🧩 Deep Dive: The 8 Auto-Types
Our framework embraces complexity without making you write boilerplate code.
Example: AutoGraph Visualization
Here is the knowledge graph visualization after AutoGraph extraction:
🛠️ Architecture Overview
Hyper-Extract follows a three-layer architecture:
-
Auto-Types define the data structures for knowledge extraction. With 8 strong-typed structures (AutoModel, AutoList, AutoSet, AutoGraph, AutoHypergraph, AutoTemporalGraph, AutoSpatialGraph, AutoSpatioTemporalGraph), they serve as the output format for all extractions.
-
Methods provide extraction algorithms built on Auto-Types. This includes Typical methods (KG-Gen, iText2KG, iText2KG*) and RAG-based methods (GraphRAG, LightRAG, Hyper-RAG, HypergraphRAG, Cog-RAG).
-
Templates offer domain-specific configurations with ready-to-use prompts and data structures. Covering 6 domains (Finance, Legal, Medical, TCM, Industry, General) with 80+ preset templates, users can extract knowledge without dealing with Auto-Types or Methods directly.
Use via CLI (he parse, he search, he show...) or Python API (Template.create()).
📚 Related Documentation
- Preset Templates: Browse 80+ ready-to-use templates across 6 domains
- Design Guide: Learn how to create custom templates
📋 Template Structure Example (Graph Type)
Here's a complete YAML template example for Graph type extraction (entity-relationship extraction):
language: en
name: Knowledge Graph
type: graph
tags: [general]
description: 'Extract entities and their relationships to construct a knowledge graph.'
output:
entities:
fields:
- name: name
type: str
description: 'Entity name'
- name: type
type: str
description: 'Entity type: e.g., person, organization, event'
- name: description
type: str
description: 'Entity description'
relations:
fields:
- name: source
type: str
description: 'Source entity'
- name: target
type: str
description: 'Target entity'
- name: type
type: str
description: 'Relation type: e.g., invention, collaboration, competition'
- name: description
type: str
description: 'Relation description'
guideline:
target: 'Extract entities and their relationships from the text.'
rules_for_entities:
- 'Extract meaningful entities'
- 'Maintain consistent naming'
rules_for_relations:
- 'Create relations only when explicitly expressed in the text'
identifiers:
entity_id: name
relation_id: '{source}|{type}|{target}'
relation_members:
source: source
target: target
display:
entity_label: '{name} ({type})'
relation_label: '{type}'
📈 Comparison with Other Libraries
| Feature | GraphRAG | LightRAG | KG-Gen | ATOM | Hyper-Extract |
|---|---|---|---|---|---|
| Knowledge Graph | ✅ | ✅ | ✅ | ✅ | ✅ |
| Temporal Graph | ✅ | ❌ | ❌ | ✅ | ✅ |
| Spatial Graph | ❌ | ❌ | ❌ | ❌ | ✅ |
| Hypergraph | ❌ | ❌ | ❌ | ❌ | ✅ |
| Domain Templates | ❌ | ❌ | ❌ | ❌ | ✅ |
| CLI Tool | ✅ | ❌ | ❌ | ❌ | ✅ |
| Multi-language | ✅ | ❌ | ❌ | ❌ | ✅ |
📚 Related Documentation
- Full Documentation - Complete documentation site
- 中文文档 - 中文文档
- CLI Guide - Command-line interface
- Template Gallery - Available templates
- Example Code - Working examples
🤝 Contributing & License
Contributions are welcome! Please submit Issues and PRs. Licensed under Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hyperextract-0.1.1.tar.gz.
File metadata
- Download URL: hyperextract-0.1.1.tar.gz
- Upload date:
- Size: 149.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b242a4ecf5f9edf068ab129e4c4bcc320cd21d9d879d6ff4b1fbfdd2bfeae56e
|
|
| MD5 |
08fd5414343032ea0e91954a76d7e656
|
|
| BLAKE2b-256 |
5cb7993695c0a27be2f50f1956ecb0ff8cf2b47b5cfc3ec462b4a56b33d04ae0
|
Provenance
The following attestation bundles were made for hyperextract-0.1.1.tar.gz:
Publisher:
publish.yml on yifanfeng97/Hyper-Extract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hyperextract-0.1.1.tar.gz -
Subject digest:
b242a4ecf5f9edf068ab129e4c4bcc320cd21d9d879d6ff4b1fbfdd2bfeae56e - Sigstore transparency entry: 1234579228
- Sigstore integration time:
-
Permalink:
yifanfeng97/Hyper-Extract@3900fb0c06a9535f56f26860ad1c33b34fdc8343 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/yifanfeng97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3900fb0c06a9535f56f26860ad1c33b34fdc8343 -
Trigger Event:
release
-
Statement type:
File details
Details for the file hyperextract-0.1.1-py3-none-any.whl.
File metadata
- Download URL: hyperextract-0.1.1-py3-none-any.whl
- Upload date:
- Size: 206.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86c52a6040c6c5e8d562efe9fce31a6ba4a2d190a9eddbaf75e6090a4e0b233f
|
|
| MD5 |
6716a6b6d9b3d0c74b6ab6da1b339c87
|
|
| BLAKE2b-256 |
dde20e5aea0c82b11b8c9c51e2467cf452cc36302033b4bba4d45c4e4f7fd9da
|
Provenance
The following attestation bundles were made for hyperextract-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on yifanfeng97/Hyper-Extract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hyperextract-0.1.1-py3-none-any.whl -
Subject digest:
86c52a6040c6c5e8d562efe9fce31a6ba4a2d190a9eddbaf75e6090a4e0b233f - Sigstore transparency entry: 1234579295
- Sigstore integration time:
-
Permalink:
yifanfeng97/Hyper-Extract@3900fb0c06a9535f56f26860ad1c33b34fdc8343 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/yifanfeng97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3900fb0c06a9535f56f26860ad1c33b34fdc8343 -
Trigger Event:
release
-
Statement type: