Wowool Entity Graph
Project description
Finding relations between entities
The entity graph app produces links between entities, each link representing a relation between two entities found in the document.
For example, the following can be used to find relations between a Person and Company:
This would produce the following output:
[
{
"from": { "label": "Person", "name": "John Smith" },
"relation": { "label": "VP", "name": "work" },
"to": { "label": "Company", "name": "IKEA" }
},
{
"from": { "label": "Person", "name": "John Smith" },
"relation": { "label": "VP", "name": "visit" },
"to": { "label": "Company", "name": "Jysk" }
},
{
"from": { "label": "Person", "name": "Bella Johansson" },
"relation": { "label": "VP", "name": "be also work" },
"to": { "label": "Company", "name": "Jysk" }
}
]
and when plotted would result in a graph such as the following:
You can directly generate cypher syntax from this by adding the Cypher app at the end of your pipeline.
Options
The options are defined as:
interface EntityGraphOptions {
links?: Link[];
nodes?: Record<str, Node>;
themes?: DataNode;
topics?: DataNode;
}
with:
| Property | Description |
|---|---|
links |
Links between nodes |
nodes |
Node definitions that can be referred to in the links, where each key is an ID that can be referenced |
themes |
Themes (categories) that link to a node |
topics |
Topics that link to a node |
All properties are optional, but at least one of the following is required to produce a result: links, themes, or topics.
Links
A link describes the nodes that will be linked to each other and their relation. It is defined as:
interface Link {
from: NodeId | Node;
relation: NodeId | Node;
to: NodeId | Node;
scope?: string;
action?: string;
}
with:
| Property | Description |
|---|---|
from |
Describes what will be stored in the from node |
relation |
Describes what will be stored in the relation node |
to |
Describes what will be stored in the to node |
scope |
A uri of the scope that will be used when creating the link |
action |
Which action to take when creating a link |
NodeId
type NodeID = string;
A NodeId is a string used to identify a node. The lookup process will first the value as a node reference in the nodes definition, then it will check if it's a known URI (or entity) from the processing pipeline (like Person in the sample above). Finally, if the string is not found in neither of the above, it will be interpreted as a label, i.e. a literal string. To summarize, the string can be interpreted as a:
- Node reference: a reference to a key within the
nodesdefinition - URI: A URI of an entity, such as
PersonorCompany - Label: A literal label
Nodes
A node describes what will be captured during the document analysis.
interface Node {
name?: string;
label?: string;
attributes?: Record<string, string>;
default?: Record<string, string>;
store?: string;
}
Name and label are both optional, but at least one of them should be specified. If only a name is used then the label will be generated using the name.
with:
| Property | Description |
|---|---|
name |
URI of the entity that will be captured, e.g. Company or Person. The value (John Doe) will be used in the results |
label |
Literal string to be used as the node's label, useful for customizations, e.g. Employee, Person1 |
attributes |
Attributes to add to the nodes, e.g. "gender" |
store |
Store the URI into memory so it can be used when creating link with entities outside the sentence scope |
default |
This is a fallback dictionary in case we still want the node to be created, even in case the name was not found |
The default option can only be used in the 'to' node, as the 'from' node cannot be optional.
An example of the definition of the node Person would be:
{
"name": "Person",
"label": "MyPerson",
"attributes": { "my_gender": "Person.gender" }
}
This would yield in the output:
{
"name": "John Smith",
"label": "MyPerson",
"my_gender": ["male"]
}
Attributes
This option specifies which attributes to add to the results of the given node. The key will be the label and the value is the content of this attribute.
Example of a node where we add the sector attribute from the entity Company to the results.
Store
This option indicates when to store uri values when processing the document, and it is used to create links that are outside the scope of a sentence.
enum Store {
sentence = "sentence",
last_seen = "last_seen",
first_seen = "first_seen",
}
with:
| Property | Description |
|---|---|
sentence |
Default value. Only the values in the current sentence |
last_seen |
Actualize the value of the variable each time we find it during analysis |
first_seen |
Store the value only once, which will be the first time we find the given uri |
The elements in Store are like mementos: things you have seen and want to remember at a later stage. It is used as a means to link to items that have previously been encountered in the document, but are not present in the sentence that is currently being processed.
Put differently: it's a list of entities, where each store corresponds to an entity and contains the last or first thing you have seen of that uri type.
Default
This option specifies a default dictionary in case we still want the node to be created, even in case the to node was not found.
For example in the following configuration the entity Object is optional, it does not need to be present, as sentences might or might not have objects.
{
"nodes": {
"_object_": {
"name": "Object",
"optional": { "default": "NoObject", "name": "no_object" }
}
},
"links": [
{
"from": "Subject",
"to": "_object_",
"relation": "VerbPhrase"
}
]
}
This would yield:
{
"from": { "label": "Subject", "name": "John Smith" },
"to": { "label": "NoObject", "name": "no_object" },
"relation": { "label": "VerbPhrase", "name": "die" }
}
Actions
This will trigger some actions when we have found a valid link. At this stage we only support link_attribute
enum Action {
link_attribute = "link_attribute",
}
with:
| Value | Description |
|---|---|
link_attribute |
add a attribute with the label of the relation node and the value of the to node to the from node entity |
Note that the attribute value pair will only be seen in the analysis.
Scopes
One of the properties in a link node is a scope. Scopes ensure we are not matching outside the given URI that defines the scope of matching.
If no scope is provided, you will link the 'to' entity to all the 'from' entities that appear in the same sentence. Sometimes we do not want to do that, because we want to be more specific in the kind of relation that the entities have.
See Scopes
DataNode
A data node is used to create multiple nodes from a list of information like the topics and the themes.
It is defined as:
interface DataNode {
to: NodeId | Node;
count?: number;
}
| Property | Description |
|---|---|
to |
Name of the node to which the data should be attached |
count |
Only take the top count elements from the data node |
If we have 5 topics but we want to link only the first 2 more relevant values, then we set the count to 2.
Topics
Topics are the most important noun groups in your document. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the topic is in the document. This property is a DataNode
{
"nodes": {
"_doc_": { "label": "Document", "name": "document.id" }
},
"topics": {
"to": "_doc_"
}
}
Linking the topics to a document requires the Topics application in your pipeline.
Themes
Themes are the most important the categories of the document, based on linguistic clues. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the theme is in the document. This property is a DataNode
{
"nodes": {
"_doc_": { "label": "Document", "name": "document.id" }
},
"tremes": {
"to": "_doc_"
}
}
Linking the themes to a document requires the Themes application in your pipeline.
Results
EntityGraphResults
The EntityGraphResults schema is defined as a array of links.
interface EntityGraphLink[] {
from : EntityGraphItem;
relation : EntityGraphItem;
to : EntityGraphItem;
}
with:
| Property | Description |
|---|---|
from |
Content of the from node |
relation |
Content of the relation node |
to |
Content of the to node |
EntityGraphItem
type EntityGraphItem = Record<string, string | string[]>;
The fields label and name are always present. Additional fields can be included if specified in the attributes. Note that the values of the requested attributes are represented as a list of strings to accommodate multiple values.
Examples
Entities
Linking companies to names of people using a relation called Person2Company:
PersonandCompanyare known entities produced by the entity domain- Person2Company will be a label as it is unknown as an entity at the time of processing
{
"links": [
{
"from": "Person",
"relation": "Person2Company",
"to": "Company"
}
]
}
Booking reference
In this example, we leverage the first_seen store option to track a booking reference within a document. The goal is to capture the initial BookingReference number and associate it with the Person entities present in the document.
{
"nodes": {
"_booking_nr_": { "name": "BookingReference", "store": "first_seen" }
},
"links": [
{
"from": "Person",
"to": "_booking_nr_",
"relation": "PersonBookingReference"
}
]
}
Scopes
We use the Snippet app to define rules for a 'work' relation between a Person and the shortest match to a Company and assign it to ScopePersonCompany, preventing incorrect links.
The sample below returns only one link: John Smith -> Ikea. Without a defined scope, two links would be returned: John Smith -> Ikea and John Smith -> Jysk.
Finding relations between entities
The entity graph app produces links between entities, each link representing a relation between two entities found in the document.
For example, the following can be used to find relations between a Person and Company:
This would produce the following output:
[
{
"from": { "label": "Person", "name": "John Smith" },
"relation": { "label": "VP", "name": "work" },
"to": { "label": "Company", "name": "IKEA" }
},
{
"from": { "label": "Person", "name": "John Smith" },
"relation": { "label": "VP", "name": "visit" },
"to": { "label": "Company", "name": "Jysk" }
},
{
"from": { "label": "Person", "name": "Bella Johansson" },
"relation": { "label": "VP", "name": "be also work" },
"to": { "label": "Company", "name": "Jysk" }
}
]
and when plotted would result in a graph such as the following:
You can directly generate cypher syntax from this by adding the Cypher app at the end of your pipeline.
Options
The options are defined as:
interface EntityGraphOptions {
links?: Link[];
nodes?: Record<str, Node>;
themes?: DataNode;
topics?: DataNode;
}
with:
| Property | Description |
|---|---|
links |
Links between nodes |
nodes |
Node definitions that can be referred to in the links, where each key is an ID that can be referenced |
themes |
Themes (categories) that link to a node |
topics |
Topics that link to a node |
All properties are optional, but at least one of the following is required to produce a result: links, themes, or topics.
Links
A link describes the nodes that will be linked to each other and their relation. It is defined as:
interface Link {
from: NodeId | Node;
relation: NodeId | Node;
to: NodeId | Node;
scope?: string;
action?: string;
}
with:
| Property | Description |
|---|---|
from |
Describes what will be stored in the from node |
relation |
Describes what will be stored in the relation node |
to |
Describes what will be stored in the to node |
scope |
A uri of the scope that will be used when creating the link |
action |
Which action to take when creating a link |
NodeId
type NodeID = string;
A NodeId is a string used to identify a node. The lookup process will first the value as a node reference in the nodes definition, then it will check if it's a known URI (or entity) from the processing pipeline (like Person in the sample above). Finally, if the string is not found in neither of the above, it will be interpreted as a label, i.e. a literal string. To summarize, the string can be interpreted as a:
- Node reference: a reference to a key within the
nodesdefinition - URI: A URI of an entity, such as
PersonorCompany - Label: A literal label
Nodes
A node describes what will be captured during the document analysis.
interface Node {
name?: string;
label?: string;
attributes?: Record<string, string>;
default?: Record<string, string>;
store?: string;
}
Name and label are both optional, but at least one of them should be specified. If only a name is used then the label will be generated using the name.
with:
| Property | Description |
|---|---|
name |
URI of the entity that will be captured, e.g. Company or Person. The value (John Doe) will be used in the results |
label |
Literal string to be used as the node's label, useful for customizations, e.g. Employee, Person1 |
attributes |
Attributes to add to the nodes, e.g. "gender" |
store |
Store the URI into memory so it can be used when creating link with entities outside the sentence scope |
default |
This is a fallback dictionary in case we still want the node to be created, even in case the name was not found |
The default option can only be used in the 'to' node, as the 'from' node cannot be optional.
An example of the definition of the node Person would be:
{
"name": "Person",
"label": "MyPerson",
"attributes": { "my_gender": "Person.gender" }
}
This would yield in the output:
{
"name": "John Smith",
"label": "MyPerson",
"my_gender": ["male"]
}
Attributes
This option specifies which attributes to add to the results of the given node. The key will be the label and the value is the content of this attribute.
Example of a node where we add the sector attribute from the entity Company to the results.
Store
This option indicates when to store uri values when processing the document, and it is used to create links that are outside the scope of a sentence.
enum Store {
sentence = "sentence",
last_seen = "last_seen",
first_seen = "first_seen",
}
with:
| Property | Description |
|---|---|
sentence |
Default value. Only the values in the current sentence |
last_seen |
Actualize the value of the variable each time we find it during analysis |
first_seen |
Store the value only once, which will be the first time we find the given uri |
The elements in Store are like mementos: things you have seen and want to remember at a later stage. It is used as a means to link to items that have previously been encountered in the document, but are not present in the sentence that is currently being processed.
Put differently: it's a list of entities, where each store corresponds to an entity and contains the last or first thing you have seen of that uri type.
Default
This option specifies a default dictionary in case we still want the node to be created, even in case the to node was not found.
For example in the following configuration the entity Object is optional, it does not need to be present, as sentences might or might not have objects.
{
"nodes": {
"_object_": {
"name": "Object",
"optional": { "default": "NoObject", "name": "no_object" }
}
},
"links": [
{
"from": "Subject",
"to": "_object_",
"relation": "VerbPhrase"
}
]
}
This would yield:
{
"from": { "label": "Subject", "name": "John Smith" },
"to": { "label": "NoObject", "name": "no_object" },
"relation": { "label": "VerbPhrase", "name": "die" }
}
Actions
This will trigger some actions when we have found a valid link. At this stage we only support link_attribute
enum Action {
link_attribute = "link_attribute",
}
with:
| Value | Description |
|---|---|
link_attribute |
add a attribute with the label of the relation node and the value of the to node to the from node entity |
Note that the attribute value pair will only be seen in the analysis.
Scopes
One of the properties in a link node is a scope. Scopes ensure we are not matching outside the given URI that defines the scope of matching.
If no scope is provided, you will link the 'to' entity to all the 'from' entities that appear in the same sentence. Sometimes we do not want to do that, because we want to be more specific in the kind of relation that the entities have.
See Scopes
DataNode
A data node is used to create multiple nodes from a list of information like the topics and the themes.
It is defined as:
interface DataNode {
to: NodeId | Node;
count?: number;
}
| Property | Description |
|---|---|
to |
Name of the node to which the data should be attached |
count |
Only take the top count elements from the data node |
If we have 5 topics but we want to link only the first 2 more relevant values, then we set the count to 2.
Topics
Topics are the most important noun groups in your document. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the topic is in the document. This property is a DataNode
{
"nodes": {
"_doc_": { "label": "Document", "name": "document.id" }
},
"topics": {
"to": "_doc_"
}
}
Linking the topics to a document requires the Topics application in your pipeline.
Themes
Themes are the most important the categories of the document, based on linguistic clues. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the theme is in the document. This property is a DataNode
{
"nodes": {
"_doc_": { "label": "Document", "name": "document.id" }
},
"tremes": {
"to": "_doc_"
}
}
Linking the themes to a document requires the Themes application in your pipeline.
Results
EntityGraphResults
The EntityGraphResults schema is defined as a array of links.
interface EntityGraphLink[] {
from : EntityGraphItem;
relation : EntityGraphItem;
to : EntityGraphItem;
}
with:
| Property | Description |
|---|---|
from |
Content of the from node |
relation |
Content of the relation node |
to |
Content of the to node |
EntityGraphItem
type EntityGraphItem = Record<string, string | string[]>;
The fields label and name are always present. Additional fields can be included if specified in the attributes. Note that the values of the requested attributes are represented as a list of strings to accommodate multiple values.
API
Examples
Pipeline
This script demonstrates how to use the Wowool SDK to extract entities and build an entity graph from English text.
from wowool.sdk import Pipeline
from wowool.utility.diagnostics import print_diagnostics
import json
text = "John Smith works for Ikea, he visited Jysk in Sweden. Bella Johansson is also working for Jysk."
pipeline = Pipeline(
[
"english",
"syntax",
"entity",
{
"name": "entity-graph.app",
"options": {
"links": [
{
"from": "Person",
"to": "Company",
"relation": "VP",
}
]
},
},
]
)
doc = pipeline(text)
if doc.results("wowool_entity_graph"):
print(json.dumps(doc.results("wowool_entity_graph"), indent=2))
else:
print_diagnostics(doc)
Entity Graph
This script demonstrates how to use the Wowool SDK to extract entities and build an entity graph from English text.
from wowool.sdk import Pipeline
from wowool.entity_graph import EntityGraph
from wowool.utility.diagnostics import print_diagnostics
import json
text = "John Smith works for Ikea, he visited Jysk in Sweden. Bella Johansson is also working for Jysk."
pipeline = Pipeline("english,entity")
# defines a relationship: from "Person" to "Company" with the relation "VP".
grapher = EntityGraph(
links=[
{
"from": "Person",
"to": "Company",
"relation": "VP",
}
]
)
doc = pipeline(text)
doc = grapher(doc)
if doc.entity_graph:
for link in doc.entity_graph:
print(f"Link: {link.from_} -> ({link.relation}) -> {link.to}")
# print(json.dumps(doc.entity_graph, indent=2))
else:
print_diagnostics(doc)
License
In both cases you will need to acquirer a license file at https://www.wowool.com
Non-Commercial
This library is licensed under the GNU AGPLv3 for non-commercial use.
For commercial use, a separate license must be purchased.
Commercial license Terms
1. Grants the right to use this library in proprietary software.
2. Requires a valid license key
3. Redistribution in SaaS requires a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wowool_entity_graph-3.1.5-py3-none-any.whl.
File metadata
- Download URL: wowool_entity_graph-3.1.5-py3-none-any.whl
- Upload date:
- Size: 28.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab87956d9d7423953d11823fb79dca64b398d4c80310358b62aa39467485342f
|
|
| MD5 |
ca5fa02dbd91731a7d4809219306c9c8
|
|
| BLAKE2b-256 |
4dca57a14f7ee57f04e1f7c9711aff86345397e8a2fd1ab303079ed3d37d2704
|