Skip to main content

A complete workflow for generating, normalizing, and visualizing Knowledge Graphs from unstructured Hebrew text

Project description

SimpleKG 🧠

PyPI version Python 3.8+ License: MIT

SimpleKG is a powerful Python package for generating Knowledge Graphs from Hebrew text using state-of-the-art language models like OpenAI's GPT-4o. It extracts entities, relationships, and creates visual knowledge representations with ontology support.

🚀 Features

  • Hebrew Text Processing: Specialized for Hebrew text analysis and knowledge extraction
  • Entity & Relation Extraction: Automatically identifies concepts and their relationships
  • Ontology Integration: Supports SKOS and other ontology standards
  • Interactive Visualizations: Generates beautiful HTML visualizations of knowledge graphs
  • Command Line Interface: Easy-to-use CLI for batch processing
  • Python API: Flexible programmatic interface for integration
  • Multiple Output Formats: JSON, HTML visualizations, and more

📦 Installation

Install SimpleKG using pip:

pip install simplekg

🔧 Quick Start

Command Line Interface

# Process a Hebrew text file
simplekg -i input.txt -o output/ --model "openai/gpt-4o" --api-key YOUR_API_KEY

# Use environment variable for API key
export OPENAI_API_KEY="your-api-key"
simplekg -i input.txt -o output/

# Get help
simplekg --help

Python API

from simplekg import KnowledgeGraphGenerator

# Initialize the generator
kggen = KnowledgeGraphGenerator(
    model="openai/gpt-4o",
    api_key="your-api-key"
)

# Process Hebrew text
text = "טקסט עברי לעיבוד..."
kggen.extractConcepts(text=text)
kggen.extractRelations()
kggen.groupConcepts()

# Apply ontology
kggen.relations2ontology(["SKOS"])

# Generate visualization
kggen.visualize("output.html")

# Save as JSON
kggen.dump_graph("graph.json")

🛠️ API Reference

KnowledgeGraphGenerator

The main class for knowledge graph generation.

Parameters:

  • model (str): Language model to use (default: "openai/gpt-4o")
  • api_key (str): API key for the language model
  • temperature (float): Model temperature (default: 0.0)
  • api_base (str): Custom API base URL (optional)
  • chunk_size (int): Text chunk size for processing (default: 0)

Key Methods:

  • extractConcepts(text, chunk_size=0, verbose=False): Extract concepts from text
  • extractRelations(verbose=False): Extract relationships between concepts
  • groupConcepts(verbose=False): Group similar concepts
  • relations2ontology(ontologies): Apply ontology standards
  • visualize(output_file): Generate HTML visualization
  • dump_graph(output_file): Save graph as JSON

📊 Supported Ontologies

  • SKOS (Simple Knowledge Organization System)
  • More ontologies coming soon!

🎯 Use Cases

  • Academic Research: Process Hebrew academic papers and texts
  • Digital Humanities: Analyze Hebrew literature and historical documents
  • Knowledge Management: Create knowledge bases from Hebrew content
  • Content Analysis: Understand relationships in Hebrew texts
  • Educational Tools: Build learning resources from Hebrew materials

🔑 Environment Setup

Set up your API key as an environment variable:

# Add to your shell profile (.bashrc, .zshrc, etc.)
export OPENAI_API_KEY="your-openai-api-key"

📁 Output Files

SimpleKG generates several types of output:

  • JSON Graph: Complete graph data structure
  • HTML Visualization: Interactive web-based visualization
  • Ontology Files: Structured ontology representations

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use SimpleKG in your research, please cite:

@software{simplekg,
  author = {Your Name},
  title = {SimpleKG: Knowledge Graph Generation from Hebrew Text},
  url = {https://gitlab.com/millerhadar/simplekg},
  version = {0.1.0},
  year = {2025}
}

🔗 Links


Made with ❤️ for the Hebrew NLP community


🚀 Installation

From PyPI (recommended)

pip install simplekg

From Source

git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e .

Development Installation

git clone https://gitlab.com/millerhadar/simplekg.git
cd simplekg
pip install -e ".[dev]"

🎯 Features

  • Extract entities and relations from raw text using an LLM.
  • Normalize entities into SKOS-compatible concepts (canonical + alt labels).
  • Normalize relations into predicates with multiple strategies:
    • LLM clustering with adjustable granularity (LOW, MEDIUM, HIGH).
    • Embedding-based clustering + LLM naming.
    • Alignment to known ontologies (SKOS, Dublin Core, CIDOC CRM).
  • Support for multiple ontologies in parallel.
  • Graph export and HTML visualization.

📖 Workflow

1. Initialize

import simplekg as kg
import os

kggen = kg.KnowledgeGraphGenerator(
    model="openai/gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY")
)

2. Extract Entities

kggen.extractConcepts(text=text, chunk_size=chunk_size, verbose=True)
  • chunk_size=0: analyze the text as a whole (may dilute context).
  • chunk_size>0: split text into smaller parts, generate subgraphs, then merge.

3. Extract Relations

kggen.extractRelations(verbose = True)

4. Normalize Entities

Groups entities into concepts (canonical + alternate names).

kggen.groupConcepts(chunk_size= 160, threshold= 160, max_iterations= 2, verbose=True)

5. Normalize Relations

  • LLM clustering: more detailed, but text-specific.
  • Ontology alignment: more general, enables cross-text comparison.
  • Multiple ontologies: select the best relation across several vocabularies.
  • LLM validation (planned): confirm embedding-based alignment.
kggen.relations2ontology(["SKOS"])
kggen.relations2ontology(["CIDOC_CRM", "DUBLIN_CORE"])

6. Visualization

Convert the KG into a chosen ontology and render HTML.

ontology = "SKOS"  # or "MIX" for multiple ontologies
kggen.graph2Ontology(ontology)
viz = kggen.visualize(f"../../vis/{file_name}_c{chunk_size}_ontology_{ontology}.html")

� Command Line Interface

SimpleKG also provides a command-line interface:

# Basic usage
simplekg --input-file text.txt --output-dir ./output

# With specific model and ontologies
simplekg --input-file text.txt --model "openai/gpt-4o" --ontologies SKOS CIDOC_CRM --verbose

# Get help
simplekg --help

�📚 Supported Ontologies

SKOS

  • skos:broader / skos:narrower
  • skos:related
  • Mapping terms: exactMatch, closeMatch, etc.

Dublin Core

  • dcterms:isPartOf / dcterms:hasPart
  • dcterms:references / dcterms:isReferencedBy
  • Versioning, formats, replacements, requirements.

CIDOC CRM (selected)

  • P5_consists_of (part-of)
  • P7_took_place_at (event location)
  • P13_destroyed (destruction)
  • P53_has_former_or_current_location
  • P94_has_created (production/creation)

⚖️ Design Trade-offs

  • Whole text vs. chunks: global context vs. fine-grained extraction.
  • LLM relation clustering: richer inside one graph, less comparable across graphs.
  • Ontology alignment: more comparable across graphs, but risk of oversimplification.

📌 References


🔮 Roadmap

  • LLM-based validation for ontology alignment.
  • Integration with additional ontologies.
  • More advanced visualization options.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplekg-0.1.0.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simplekg-0.1.0-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file simplekg-0.1.0.tar.gz.

File metadata

  • Download URL: simplekg-0.1.0.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for simplekg-0.1.0.tar.gz
Algorithm Hash digest
SHA256 88ee3f60ac8bb9c2985e6a1d5549ef8a7ba53939e860a9f1736760ef7aeb10d1
MD5 d79eecb775eac15cc786114eec871c5f
BLAKE2b-256 18d95b38d38573fe69e34dacb353337936803a87542a09cfa254abddd254f239

See more details on using hashes here.

File details

Details for the file simplekg-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: simplekg-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for simplekg-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a001aa895c33e23313792c18788d9cf933dd50e6fc3ab765a31946519050091
MD5 7ea1f727c659a62a69a117a0495c3ad8
BLAKE2b-256 8f3a14605702324bdb66fe8f815351248e9266fec916da49c203c85806c4a09b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page