Use an LLM to run inference, for NER (named entity recognition) and/or NEL (named entity linking). Build a knowledge graph. Evaluate results.
Project description
LLM Knowledge Graph
Use an LLM to generate a knowledge graph.
The purpose of the library is to extract entities and their relationships without a pre-defined schema.
Input: Paragraphs of text
Output: Nodes and Relationships
Output: Merge the nodes into the Knowledge Graph database
Install and use library
pip install llm_ner_nel
from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
text = '''Vincent de Groof (6 December 1830 – 9 July 1874) was a Dutch-born Belgian early pioneering aeronaut. He created an early model of an ornithopter'''
inference_provider = RelationshipInferenceProvider(model="llama3.2")
relationships = inference_provider.get_relationships(text)
display_relationships(relationships, console_log=True)
Output:
Vincent de Groof:(1.0) (Person) --[born in]-> Dutch-born Belgian:(1.0) (Location)
Vincent de Groof:(1.0) (Person) --[created an early model of]-> an ornithopter:(1.0) (Thing)
Vincent de Groof:(1.0) (Person) --[pioneering aeronaut]-> early:(1.0) (Attribute)
Dev environment setup - Knowledge Graph
pip install -e .
from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
from llm_ner_nel.knowledge_graph.graph import KnowledgeGraph
text = '''Australia’s centre-left prime minister, Anthony Albanese, has won a second term with a crushing victory over the opposition, whose rightwing leader, Peter Dutton, failed to brush off comparisons with Donald Trump and ended up losing his own seat.
Australians have voted for a future that holds true to these values, a future built on everything that brings us together as Australians, and everything that sets our nation apart from the world'''
inference_provider = RelationshipInferenceProvider(model="llama3.2")
relationships = inference_provider.get_relationships(text)
display_relationships(relationships, console_log=True)
graph.add_or_merge_relationships(relationships, "https://en.wikipedia.org/wiki/Vincent_de_Groof", "Wikipedia")
Environment setup
Neo4j
Run a local instance of the Neo4J database
- Run the startNeo4j.sh script
- Or run the following
docker run \
--name neo4j \
-d --rm \
-p 7474:7474 -p 7687:7687 \
--name neo4j-apoc \
--volume=./data:/data \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
neo4j:2025.03
Verify access to Neo4J:
http://localhost:7474/browser/
- Username: neo4j
- Password: password
Local LLM - Ollama
Run a local LLM using OLLAMA.
-
Download and install OLLAMA: https://ollama.com/download
-
Download the LLM Model
ollama run gemma3:12b
Note: if you have a GPU or an environment with High Bandwidth Memory, then it is recommended to run a larger model.
Example
Model: gemma3:27b
from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
from llm_ner_nel.knowledge_graph.graph import KnowledgeGraph
relationships_extractor = RelationshipInferenceProvider(
model="gemma3:27b",
ollama_host="http://localhost:11434",
mlflow_tracking_host="http://localhost:5000",
mlflow_system_prompt_id = None, # if mlflow prompt repository is not setup ("NER_System/1")
mlflow_user_prompt_id = None # if mlflow prompt repository is not setup ("NER_User/1")
)
graph=KnowledgeGraph()
# Get relationships from text
relationships = relationships_extractor.get_relationships(text=text)
# Merge the relationships to a knowledge graph
graph.add_or_merge_relationships(relationships, src="book", src_type="politics")
Hyper parameters
The LLM hyper parameters can be defined, these would help to refine the expected output for concrete use-cases:
class LlmConfig:
model: str
temperature: float
top_k: int
top_p: float
max_tokens: int
repeat_penalty: float
frequency_penalty: float
presence_penalty: float
typical_p: float
num_thread: int
Prompt management
Prompts can vary between models/providers. Version controlling the prompts and annotating the prompt helps with faster evaluation.
MlFlow provides a prompt repository and is used to get the user prompt and system prompt.
E.g. prompts for named entity recognition
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
-e MLFLOW_USER_PROMPT_ID=NER_User/@entity \
E.g. prompts for a knowledge graph using named entity recognition (NER) and named entity linking (NEL)
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@relationship \
-e MLFLOW_USER_PROMPT_ID=NER_User/@relationship \
Data pipelines - Consumers
Two data pipeline examples show how Named Entity Recognition can be used:
- Decorate text with entities
- Build a knowledge graph
E.g.
Mongo DB entitiy extractor
In this example text is read from a MongoDb Collection and the entities are extracted/stored back to the mongodb collection.
The container would read a collection in batches and perform NER.
docker run -e OLLAMA_MODEL=gemma3:12b \
-e OLLAMA_HOST=http://ollama:11434 \
-e MONGODB_CONNECTION_STRING=mongodb://mongodb:27017 \
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
-e MLFLOW_USER_PROMPT_ID=NER_User/@entity \
-e MLFLOW_TRACKING_HOST=http://mlflow:5000
--network development_network \
--rm \
--name entity_extractor \
--hostname entity_extractor \
mongodb-entity-extractor
RabbitMQ Knowledge Graph Builder
Use RabbitMQ to Publish/Consume messages.
The messages will be processed and merged to a knowledge graph
Message format:
{
"src": "https://en.wikipedia.org/wiki/Vincent_de_Groof",
"src_type": "Newspaper",
"text": "Vincent de Groof (6 December 1830 – 9 July 1874) was a Dutch-born Belgian early pioneering aeronaut. He created an early model of an ornithopter and successfully demonstrated its use before fatally crashing the machine in London, UK.[1] "
}
Pipeline dependecy RabbitMq
Start rabbitmq:
docker run -d --rm --network development_network --hostname rabbitmq --name rabbitmq -p 15672:15672 -p 5672:5672 rabbitmq:3.8.12-rc.1-management
In this example data is published to a rabbitmq queue. The data is consumed and appended to a Neo4j Knowledge Graph
docker run -e RABBITMQ_PORT=5672 \
-e RABBITMQ_HOST=rabbitmq \
-e RABBITMQ_VHOST=dev \
-e RABBITMQ_QUEUE=DatapipelineCleanData \
-e RABBITMQ_USER=guest \
-e RABBITMQ_PASSWORD=guest \
-e OLLAMA_MODEL=gemma3:12b \
-e OLLAMA_HOST=http://ollama:11434 \
-e NEO4J_URI=bolt://neo4j-apoc:7687 \
-e NEO4J_USERNAME=neo4j \
-e NEO4J_PASSWORD=password \
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@relationship \
-e MLFLOW_USER_PROMPT_ID=NER_User/@relationship \
-e MLFLOW_TRACKING_HOST=http://mlflow:5000 \
--network development_network \
--rm \
--name knowledge_consumer \
--hostname knowledge_consumer \
rabbit-mq-graph-builder
Dependencies
Ollama should be running in a docker container; or a network route should be available.
Run the following for a CPU-Only ollama instance:
docker run -d --rm
-v /usr/share/ollama/.ollama:/root/.ollama \
-p 11434:11434 \
--network development_network \
--name ollama \
--hostname ollama \
ollama/ollama
Note: replace the path /usr/share/ollama/.ollama to the host's ollama model path
Evaluation
Due to the sheer amount of model options and inference providers; it is necessary to evaluate.
Evaluation allows methodical selection of hyper parameters:
- User Prompt (based on model and provider)
- System Prompt (based on model and provider)
- Model
Evaluation criteria
The first step of an evaluation criteria is to define ground truths:
Text --> Expected entities
E.g.
"Input", "GroundTruth"
"Amazon and Google announced a partnership.","Amazon|Google"
Based on the ground truth calculate the following:
- True positives (TP)
- False positives (FP)
- False negatives (FN)
Using the above calculate the following:
- Precision
- Recall
Finally calculate an F1 score.
Precision
Proportion of correctly identified entities (accuracy).
High precision means that when the model identifies an entity, it is likely to be correct.
$$ \begin{aligned} \text{Precision} &= \frac{TP}{TP + FP} \end{aligned} $$
Where:
- ( TP ) = True Positives
- ( FP ) = False Positives
Recall
Ability to find the relevant entities.
High recall means the model has detected most of the actual entities in the text.
$$ \begin{aligned} \text{Recall} &= \frac{TP}{TP + FN} \end{aligned} $$
Where:
- ( TP ) = True Positives
- ( FN ) = False Negatives
F1 Score
$$ \begin{aligned} F1 &= 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \end{aligned} $$
Experiments
Publish the evaluation metrics for each "experiment".
ML-Flow is a good tool for analysis.
License
Copyright (C) 2025 Paul Eger
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_ner_nel-0.5.0.tar.gz.
File metadata
- Download URL: llm_ner_nel-0.5.0.tar.gz
- Upload date:
- Size: 30.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf3b04b8c61c0f281bd9accf1a9849b7b2109c34c854ffe615c6e0c01735a3d6
|
|
| MD5 |
19c596e3ab2316928aa123f04d4b371f
|
|
| BLAKE2b-256 |
1504486cfc3dd4510e9bb3e2cb20a8f7d0afed749d6fcee4b9efeb60984e862a
|
File details
Details for the file llm_ner_nel-0.5.0-py3-none-any.whl.
File metadata
- Download URL: llm_ner_nel-0.5.0-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e612e831704cc2c55890316d485fef27a1ed86b5a321f0ced496b17edc4b04c
|
|
| MD5 |
94181e1686f8046bf80ea9b22a65ee82
|
|
| BLAKE2b-256 |
efd0ed02e9a2215977c0cb43b431e0d373fba3225595a781fe04b9c8fdc93999
|