A Python package to parse text into a knowledge graph using LLMs.
Project description
Knowledge Graph Parser
Overview
kg-parser is a Python package and CLI application that extracts structured knowledge graph triples from unstructured text using large language models (LLMs). The package supports multiple backends (HuggingFace, OpenAI, and local Jan) and offers flexible output formats—either as arrays of strings or as dictionaries.
Features
- Multi-backend Support: Use HuggingFace, OpenAI, or a local Jan server.
- Batch Processing: Process multiple texts efficiently.
- Flexible Output: Choose between list or dict representations for triples.
- CLI & API: Easily run as a command-line tool or integrate into your Python projects.
Installation
Using Conda
Create and activate the conda environment with the provided configuration:
conda env create -f environment.yml
conda activate kg-parser
Using Pip
Install the package directly from PyPI:
pip install kg-parser
Install the minimal dependencies from requirements.txt:
pip install -r requirements.txt
Alternatively, install the package in editable mode:
pip install -e .
Usage
As a CLI Application
Run the CLI tool from the command line:
python -m kg_parser.cli \
--model-type huggingface \
--model-name-or-path "google/flan-t5-small" \
--input-file test_input.json \
--output-file output_kg.json \
--triple-format list
Arguments:
--model-type: Choose fromhuggingface,openai, orjan_local.--model-name-or-path: Specify the model name or path.--input-file: Path to a JSON file containing an array of text strings.--output-file: Path where the output JSON will be saved.--triple-format: Output format for triples (listfor arrays ordictfor dictionaries).
As a Python Package
Import and use kg-parser in your own Python scripts:
from kg_parser.config import ModelConfig, ModelType
from kg_parser.core import KGParser
# Configure the model
model_config = ModelConfig(
model_type=ModelType.HUGGINGFACE,
model_name_or_path="google/flan-t5-small"
)
parser = KGParser(model_config)
# Process texts
texts = [
"Mount Everest is the highest mountain in the world. It's located in Nepal."
]
results = parser.parse_batch(texts)
# Save results with triples as lists
parser.save_to_json(results, "output_kg.json", triple_format="list")
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgements
- The knowledge graph extraction prompt is adapted from the paper GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework. The original unmodified prompt can be found in Appendix A ("A. KG Construction Prompt").
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kg_parser-0.1.1.tar.gz.
File metadata
- Download URL: kg_parser-0.1.1.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a178fbf5ac5febc00c24daf7e3625f5f725bce0475aadab9c24d7ed523db873
|
|
| MD5 |
0850c612db92b7bba2f00a0bee8c3b47
|
|
| BLAKE2b-256 |
3a46ee4fbdbcbe4afc694ee6b5408c6764d92a2ff99d6c445ea2903283793fe9
|
File details
Details for the file kg_parser-0.1.1-py3-none-any.whl.
File metadata
- Download URL: kg_parser-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
927584b40906718dc0abedb38ad29aa1d4d6d7aa9031506f263060706b13be7b
|
|
| MD5 |
7886f6cf1b92d4b563a7dd9dcd53718f
|
|
| BLAKE2b-256 |
83d55f882ee34e599811210fd1f8c87713e40c228655dd176d0d2726134042d8
|