A complete workflow for generating, normalizing, and visualizing Knowledge Graphs from unstructured Hebrew text
Project description
SimpleKG 🧠
SimpleKG is a powerful Python package for generating Knowledge Graphs from Hebrew text using state-of-the-art language models like OpenAI's GPT-4o. It extracts entities, relationships, and creates visual knowledge representations with ontology support.
🚀 Features
- Hebrew Text Processing: Specialized for Hebrew text analysis and knowledge extraction
- Entity & Relation Extraction: Automatically identifies concepts and their relationships
- Ontology Integration: Supports SKOS and other ontology standards
- Interactive Visualizations: Generates beautiful HTML visualizations of knowledge graphs
- Command Line Interface: Easy-to-use CLI for batch processing
- Python API: Flexible programmatic interface for integration
- Multiple Output Formats: JSON, HTML visualizations, and more
📦 Installation
Install SimpleKG using pip:
pip install simplekg
🔧 Quick Start
Command Line Interface
# Process a Hebrew text file
simplekg -i input.txt -o output/ --model "openai/gpt-4o" --api-key YOUR_API_KEY
# Use environment variable for API key
export OPENAI_API_KEY="your-api-key"
simplekg -i input.txt -o output/
# Get help
simplekg --help
Python API
from simplekg import KnowledgeGraphGenerator
# Initialize the generator
kggen = KnowledgeGraphGenerator(
model="openai/gpt-4o",
api_key="your-api-key"
)
# Process Hebrew text
text = "טקסט עברי לעיבוד..."
kggen.extractConcepts(text=text)
kggen.extractRelations()
kggen.groupConcepts()
# Apply ontology
kggen.relations2ontology(["SKOS"])
# Generate visualization
kggen.visualize("output.html")
# Save as JSON
kggen.dump_graph("graph.json")
🛠️ API Reference
KnowledgeGraphGenerator
The main class for knowledge graph generation.
Parameters:
model(str): Language model to use (default: "openai/gpt-4o")api_key(str): API key for the language modeltemperature(float): Model temperature (default: 0.0)api_base(str): Custom API base URL (optional)chunk_size(int): Text chunk size for processing (default: 0)
Key Methods:
extractConcepts(text, chunk_size=0, verbose=False): Extract concepts from textextractRelations(verbose=False): Extract relationships between conceptsgroupConcepts(verbose=False): Group similar conceptsrelations2ontology(ontologies): Apply ontology standardsvisualize(output_file): Generate HTML visualizationdump_graph(output_file): Save graph as JSON
📊 Supported Ontologies
- SKOS (Simple Knowledge Organization System)
- More ontologies coming soon!
🎯 Use Cases
- Academic Research: Process Hebrew academic papers and texts
- Digital Humanities: Analyze Hebrew literature and historical documents
- Knowledge Management: Create knowledge bases from Hebrew content
- Content Analysis: Understand relationships in Hebrew texts
- Educational Tools: Build learning resources from Hebrew materials
🔑 Environment Setup
Set up your API key as an environment variable:
# Add to your shell profile (.bashrc, .zshrc, etc.)
export OPENAI_API_KEY="your-openai-api-key"
📁 Output Files
SimpleKG generates several types of output:
- JSON Graph: Complete graph data structure
- HTML Visualization: Interactive web-based visualization
- Ontology Files: Structured ontology representations
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📚 Citation
If you use SimpleKG in your research, please cite:
@software{simplekg,
author = {Your Name},
title = {SimpleKG: Knowledge Graph Generation from Hebrew Text},
url = {https://gitlab.com/millerhadar/simplekg},
version = {0.1.0},
year = {2025}
}
🔗 Links
Made with ❤️ for the Hebrew NLP community
🚀 Installation
From PyPI (recommended)
pip install simplekg
From Source
git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e .
Development Installation
git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e ".[dev]"
🎯 Features
- Extract entities and relations from raw text using an LLM.
- Normalize entities into SKOS-compatible concepts (canonical + alt labels).
- Normalize relations into predicates with multiple strategies:
- LLM clustering with adjustable granularity (LOW, MEDIUM, HIGH).
- Embedding-based clustering + LLM naming.
- Alignment to known ontologies (SKOS, Dublin Core, CIDOC CRM).
- Support for multiple ontologies in parallel.
- Graph export and HTML visualization.
📖 Workflow
1. Initialize
import simplekg as kg
import os
kggen = kg.KnowledgeGraphGenerator(
model="openai/gpt-4o",
api_key=os.getenv("OPENAI_API_KEY")
)
2. Extract Entities
kggen.extractConcepts(text=text, chunk_size=chunk_size, verbose=True)
chunk_size=0: analyze the text as a whole (may dilute context).chunk_size>0: split text into smaller parts, generate subgraphs, then merge.
3. Extract Relations
kggen.extractRelations(verbose = True)
4. Normalize Entities
Groups entities into concepts (canonical + alternate names).
kggen.groupConcepts(chunk_size= 160, threshold= 160, max_iterations= 2, verbose=True)
5. Normalize Relations
- LLM clustering: more detailed, but text-specific.
- Ontology alignment: more general, enables cross-text comparison.
- Multiple ontologies: select the best relation across several vocabularies.
- LLM validation (planned): confirm embedding-based alignment.
kggen.relations2ontology(["SKOS"])
kggen.relations2ontology(["CIDOC_CRM", "DUBLIN_CORE"])
6. Visualization
Convert the KG into a chosen ontology and render HTML.
ontology = "SKOS" # or "MIX" for multiple ontologies
kggen.graph2Ontology(ontology)
viz = kggen.visualize(f"../../vis/{file_name}_c{chunk_size}_ontology_{ontology}.html")
� Command Line Interface
SimpleKG also provides a command-line interface:
# Basic usage
simplekg --input-file text.txt --output-dir ./output
# With specific model and ontologies
simplekg --input-file text.txt --model "openai/gpt-4o" --ontologies SKOS CIDOC_CRM --verbose
# Get help
simplekg --help
�📚 Supported Ontologies
SKOS
skos:broader/skos:narrowerskos:related- Mapping terms:
exactMatch,closeMatch, etc.
Dublin Core
dcterms:isPartOf/dcterms:hasPartdcterms:references/dcterms:isReferencedBy- Versioning, formats, replacements, requirements.
CIDOC CRM (selected)
P5_consists_of(part-of)P7_took_place_at(event location)P13_destroyed(destruction)P53_has_former_or_current_locationP94_has_created(production/creation)
⚖️ Design Trade-offs
- Whole text vs. chunks: global context vs. fine-grained extraction.
- LLM relation clustering: richer inside one graph, less comparable across graphs.
- Ontology alignment: more comparable across graphs, but risk of oversimplification.
📌 References
🔮 Roadmap
- LLM-based validation for ontology alignment.
- Integration with additional ontologies.
- More advanced visualization options.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simplekg-0.1.0.tar.gz.
File metadata
- Download URL: simplekg-0.1.0.tar.gz
- Upload date:
- Size: 41.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88ee3f60ac8bb9c2985e6a1d5549ef8a7ba53939e860a9f1736760ef7aeb10d1
|
|
| MD5 |
d79eecb775eac15cc786114eec871c5f
|
|
| BLAKE2b-256 |
18d95b38d38573fe69e34dacb353337936803a87542a09cfa254abddd254f239
|
File details
Details for the file simplekg-0.1.0-py3-none-any.whl.
File metadata
- Download URL: simplekg-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a001aa895c33e23313792c18788d9cf933dd50e6fc3ab765a31946519050091
|
|
| MD5 |
7ea1f727c659a62a69a117a0495c3ad8
|
|
| BLAKE2b-256 |
8f3a14605702324bdb66fe8f815351248e9266fec916da49c203c85806c4a09b
|