Skip to main content

Wowool Entity Graph

Project description

Finding relations between entities

The entity graph app produces links between entities, each link representing a relation between two entities found in the document.

For example, the following can be used to find relations between a Person and Company:

This would produce the following output:

[
  {
    "from": { "label": "Person", "name": "John Smith" },
    "relation": { "label": "VP", "name": "work" },
    "to": { "label": "Company", "name": "IKEA" }
  },
  {
    "from": { "label": "Person", "name": "John Smith" },
    "relation": { "label": "VP", "name": "visit" },
    "to": { "label": "Company", "name": "Jysk" }
  },
  {
    "from": { "label": "Person", "name": "Bella Johansson" },
    "relation": { "label": "VP", "name": "be also work" },
    "to": { "label": "Company", "name": "Jysk" }
  }
]

and when plotted would result in a graph such as the following:

You can directly generate cypher syntax from this by adding the Cypher app at the end of your pipeline.

Options

The options are defined as:

interface EntityGraphOptions {
  links?: Link[];
  nodes?: Record<str, Node>;
  themes?: DataNode;
  topics?: DataNode;
}

with:

Property Description
links Links between nodes
nodes Node definitions that can be referred to in the links, where each key is an ID that can be referenced
themes Themes (categories) that link to a node
topics Topics that link to a node

All properties are optional, but at least one of the following is required to produce a result: links, themes, or topics.

Links

A link describes the nodes that will be linked to each other and their relation. It is defined as:

interface Link {
  from: NodeId | Node;
  relation: NodeId | Node;
  to: NodeId | Node;
  scope?: string;
  action?: string;
}

with:

Property Description
from Describes what will be stored in the from node
relation Describes what will be stored in the relation node
to Describes what will be stored in the to node
scope A uri of the scope that will be used when creating the link
action Which action to take when creating a link

NodeId

type NodeID = string;

A NodeId is a string used to identify a node. The lookup process will first the value as a node reference in the nodes definition, then it will check if it's a known URI (or entity) from the processing pipeline (like Person in the sample above). Finally, if the string is not found in neither of the above, it will be interpreted as a label, i.e. a literal string. To summarize, the string can be interpreted as a:

  • Node reference: a reference to a key within the nodes definition
  • URI: A URI of an entity, such as Person or Company
  • Label: A literal label

Nodes

A node describes what will be captured during the document analysis.

interface Node {
  name?: string;
  label?: string;
  attributes?: Record<string, string>;
  default?: Record<string, string>;
  store?: string;
}

Name and label are both optional, but at least one of them should be specified. If only a name is used then the label will be generated using the name.

with:

Property Description
name URI of the entity that will be captured, e.g. Company or Person. The value (John Doe) will be used in the results
label Literal string to be used as the node's label, useful for customizations, e.g. Employee, Person1
attributes Attributes to add to the nodes, e.g. "gender"
store Store the URI into memory so it can be used when creating link with entities outside the sentence scope
default This is a fallback dictionary in case we still want the node to be created, even in case the name was not found

The default option can only be used in the 'to' node, as the 'from' node cannot be optional.

An example of the definition of the node Person would be:

{
  "name": "Person",
  "label": "MyPerson",
  "attributes": { "my_gender": "Person.gender" }
}

This would yield in the output:

{
  "name": "John Smith",
  "label": "MyPerson",
  "my_gender": ["male"]
}

Attributes

This option specifies which attributes to add to the results of the given node. The key will be the label and the value is the content of this attribute.

Example of a node where we add the sector attribute from the entity Company to the results.

Store

This option indicates when to store uri values when processing the document, and it is used to create links that are outside the scope of a sentence.

enum Store {
  sentence = "sentence",
  last_seen = "last_seen",
  first_seen = "first_seen",
}

with:

Property Description
sentence Default value. Only the values in the current sentence
last_seen Actualize the value of the variable each time we find it during analysis
first_seen Store the value only once, which will be the first time we find the given uri

The elements in Store are like mementos: things you have seen and want to remember at a later stage. It is used as a means to link to items that have previously been encountered in the document, but are not present in the sentence that is currently being processed. Put differently: it's a list of entities, where each store corresponds to an entity and contains the last or first thing you have seen of that uri type.

See Booking Reference

Default

This option specifies a default dictionary in case we still want the node to be created, even in case the to node was not found.

For example in the following configuration the entity Object is optional, it does not need to be present, as sentences might or might not have objects.

{
  "nodes": {
    "_object_": {
      "name": "Object",
      "optional": { "default": "NoObject", "name": "no_object" }
    }
  },
  "links": [
    {
      "from": "Subject",
      "to": "_object_",
      "relation": "VerbPhrase"
    }
  ]
}

This would yield:

{
  "from": { "label": "Subject", "name": "John Smith" },
  "to": { "label": "NoObject", "name": "no_object" },
  "relation": { "label": "VerbPhrase", "name": "die" }
}

Actions

This will trigger some actions when we have found a valid link. At this stage we only support link_attribute

enum Action {
  link_attribute = "link_attribute",
}

with:

Value Description
link_attribute add a attribute with the label of the relation node and the value of the to node to the from node entity

Note that the attribute value pair will only be seen in the analysis.

Scopes

One of the properties in a link node is a scope. Scopes ensure we are not matching outside the given URI that defines the scope of matching.

If no scope is provided, you will link the 'to' entity to all the 'from' entities that appear in the same sentence. Sometimes we do not want to do that, because we want to be more specific in the kind of relation that the entities have.

See Scopes

DataNode

A data node is used to create multiple nodes from a list of information like the topics and the themes.

It is defined as:

interface DataNode {
  to: NodeId | Node;
  count?: number;
}
Property Description
to Name of the node to which the data should be attached
count Only take the top count elements from the data node

If we have 5 topics but we want to link only the first 2 more relevant values, then we set the count to 2.

Topics

Topics are the most important noun groups in your document. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the topic is in the document. This property is a DataNode

{
  "nodes": {
    "_doc_": { "label": "Document", "name": "document.id" }
  },
  "topics": {
    "to": "_doc_"
  }
}

Linking the topics to a document requires the Topics application in your pipeline.

Themes

Themes are the most important the categories of the document, based on linguistic clues. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the theme is in the document. This property is a DataNode

{
  "nodes": {
    "_doc_": { "label": "Document", "name": "document.id" }
  },
  "tremes": {
    "to": "_doc_"
  }
}

Linking the themes to a document requires the Themes application in your pipeline.

Results

EntityGraphResults

The EntityGraphResults schema is defined as a array of links.

interface EntityGraphLink[] {
    from : EntityGraphItem;
    relation : EntityGraphItem;
    to : EntityGraphItem;
}

with:

Property Description
from Content of the from node
relation Content of the relation node
to Content of the to node

EntityGraphItem

type EntityGraphItem = Record<string, string | string[]>;

The fields label and name are always present. Additional fields can be included if specified in the attributes. Note that the values of the requested attributes are represented as a list of strings to accommodate multiple values.

Examples

Entities

Linking companies to names of people using a relation called Person2Company:

  • Person and Company are known entities produced by the entity domain
  • Person2Company will be a label as it is unknown as an entity at the time of processing
{
  "links": [
    {
      "from": "Person",
      "relation": "Person2Company",
      "to": "Company"
    }
  ]
}

Booking reference

In this example, we leverage the first_seen store option to track a booking reference within a document. The goal is to capture the initial BookingReference number and associate it with the Person entities present in the document.

{
  "nodes": {
    "_booking_nr_": { "name": "BookingReference", "store": "first_seen" }
  },
  "links": [
    {
      "from": "Person",
      "to": "_booking_nr_",
      "relation": "PersonBookingReference"
    }
  ]
}

Scopes

We use the Snippet app to define rules for a 'work' relation between a Person and the shortest match to a Company and assign it to ScopePersonCompany, preventing incorrect links. The sample below returns only one link: John Smith -> Ikea. Without a defined scope, two links would be returned: John Smith -> Ikea and John Smith -> Jysk.

Finding relations between entities

The entity graph app produces links between entities, each link representing a relation between two entities found in the document.

For example, the following can be used to find relations between a Person and Company:

This would produce the following output:

[
  {
    "from": { "label": "Person", "name": "John Smith" },
    "relation": { "label": "VP", "name": "work" },
    "to": { "label": "Company", "name": "IKEA" }
  },
  {
    "from": { "label": "Person", "name": "John Smith" },
    "relation": { "label": "VP", "name": "visit" },
    "to": { "label": "Company", "name": "Jysk" }
  },
  {
    "from": { "label": "Person", "name": "Bella Johansson" },
    "relation": { "label": "VP", "name": "be also work" },
    "to": { "label": "Company", "name": "Jysk" }
  }
]

and when plotted would result in a graph such as the following:

You can directly generate cypher syntax from this by adding the Cypher app at the end of your pipeline.

Options

The options are defined as:

interface EntityGraphOptions {
  links?: Link[];
  nodes?: Record<str, Node>;
  themes?: DataNode;
  topics?: DataNode;
}

with:

Property Description
links Links between nodes
nodes Node definitions that can be referred to in the links, where each key is an ID that can be referenced
themes Themes (categories) that link to a node
topics Topics that link to a node

All properties are optional, but at least one of the following is required to produce a result: links, themes, or topics.

Links

A link describes the nodes that will be linked to each other and their relation. It is defined as:

interface Link {
  from: NodeId | Node;
  relation: NodeId | Node;
  to: NodeId | Node;
  scope?: string;
  action?: string;
}

with:

Property Description
from Describes what will be stored in the from node
relation Describes what will be stored in the relation node
to Describes what will be stored in the to node
scope A uri of the scope that will be used when creating the link
action Which action to take when creating a link

NodeId

type NodeID = string;

A NodeId is a string used to identify a node. The lookup process will first the value as a node reference in the nodes definition, then it will check if it's a known URI (or entity) from the processing pipeline (like Person in the sample above). Finally, if the string is not found in neither of the above, it will be interpreted as a label, i.e. a literal string. To summarize, the string can be interpreted as a:

  • Node reference: a reference to a key within the nodes definition
  • URI: A URI of an entity, such as Person or Company
  • Label: A literal label

Nodes

A node describes what will be captured during the document analysis.

interface Node {
  name?: string;
  label?: string;
  attributes?: Record<string, string>;
  default?: Record<string, string>;
  store?: string;
}

Name and label are both optional, but at least one of them should be specified. If only a name is used then the label will be generated using the name.

with:

Property Description
name URI of the entity that will be captured, e.g. Company or Person. The value (John Doe) will be used in the results
label Literal string to be used as the node's label, useful for customizations, e.g. Employee, Person1
attributes Attributes to add to the nodes, e.g. "gender"
store Store the URI into memory so it can be used when creating link with entities outside the sentence scope
default This is a fallback dictionary in case we still want the node to be created, even in case the name was not found

The default option can only be used in the 'to' node, as the 'from' node cannot be optional.

An example of the definition of the node Person would be:

{
  "name": "Person",
  "label": "MyPerson",
  "attributes": { "my_gender": "Person.gender" }
}

This would yield in the output:

{
  "name": "John Smith",
  "label": "MyPerson",
  "my_gender": ["male"]
}

Attributes

This option specifies which attributes to add to the results of the given node. The key will be the label and the value is the content of this attribute.

Example of a node where we add the sector attribute from the entity Company to the results.

Store

This option indicates when to store uri values when processing the document, and it is used to create links that are outside the scope of a sentence.

enum Store {
  sentence = "sentence",
  last_seen = "last_seen",
  first_seen = "first_seen",
}

with:

Property Description
sentence Default value. Only the values in the current sentence
last_seen Actualize the value of the variable each time we find it during analysis
first_seen Store the value only once, which will be the first time we find the given uri

The elements in Store are like mementos: things you have seen and want to remember at a later stage. It is used as a means to link to items that have previously been encountered in the document, but are not present in the sentence that is currently being processed. Put differently: it's a list of entities, where each store corresponds to an entity and contains the last or first thing you have seen of that uri type.

See Booking Reference

Default

This option specifies a default dictionary in case we still want the node to be created, even in case the to node was not found.

For example in the following configuration the entity Object is optional, it does not need to be present, as sentences might or might not have objects.

{
  "nodes": {
    "_object_": {
      "name": "Object",
      "optional": { "default": "NoObject", "name": "no_object" }
    }
  },
  "links": [
    {
      "from": "Subject",
      "to": "_object_",
      "relation": "VerbPhrase"
    }
  ]
}

This would yield:

{
  "from": { "label": "Subject", "name": "John Smith" },
  "to": { "label": "NoObject", "name": "no_object" },
  "relation": { "label": "VerbPhrase", "name": "die" }
}

Actions

This will trigger some actions when we have found a valid link. At this stage we only support link_attribute

enum Action {
  link_attribute = "link_attribute",
}

with:

Value Description
link_attribute add a attribute with the label of the relation node and the value of the to node to the from node entity

Note that the attribute value pair will only be seen in the analysis.

Scopes

One of the properties in a link node is a scope. Scopes ensure we are not matching outside the given URI that defines the scope of matching.

If no scope is provided, you will link the 'to' entity to all the 'from' entities that appear in the same sentence. Sometimes we do not want to do that, because we want to be more specific in the kind of relation that the entities have.

See Scopes

DataNode

A data node is used to create multiple nodes from a list of information like the topics and the themes.

It is defined as:

interface DataNode {
  to: NodeId | Node;
  count?: number;
}
Property Description
to Name of the node to which the data should be attached
count Only take the top count elements from the data node

If we have 5 topics but we want to link only the first 2 more relevant values, then we set the count to 2.

Topics

Topics are the most important noun groups in your document. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the topic is in the document. This property is a DataNode

{
  "nodes": {
    "_doc_": { "label": "Document", "name": "document.id" }
  },
  "topics": {
    "to": "_doc_"
  }
}

Linking the topics to a document requires the Topics application in your pipeline.

Themes

Themes are the most important the categories of the document, based on linguistic clues. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the theme is in the document. This property is a DataNode

{
  "nodes": {
    "_doc_": { "label": "Document", "name": "document.id" }
  },
  "tremes": {
    "to": "_doc_"
  }
}

Linking the themes to a document requires the Themes application in your pipeline.

Results

EntityGraphResults

The EntityGraphResults schema is defined as a array of links.

interface EntityGraphLink[] {
    from : EntityGraphItem;
    relation : EntityGraphItem;
    to : EntityGraphItem;
}

with:

Property Description
from Content of the from node
relation Content of the relation node
to Content of the to node

EntityGraphItem

type EntityGraphItem = Record<string, string | string[]>;

The fields label and name are always present. Additional fields can be included if specified in the attributes. Note that the values of the requested attributes are represented as a list of strings to accommodate multiple values.

API

Examples

Pipeline

This script demonstrates how to use the Wowool SDK to extract entities and build an entity graph from English text.

from wowool.sdk import Pipeline
from wowool.utility.diagnostics import print_diagnostics
import json

text = "John Smith works for Ikea, he visited Jysk in Sweden. Bella Johansson is also working for Jysk."
pipeline = Pipeline(
    [
        "english",
        "syntax",
        "entity",
        {
            "name": "entity-graph.app",
            "options": {
                "links": [
                    {
                        "from": "Person",
                        "to": "Company",
                        "relation": "VP",
                    }
                ]
            },
        },
    ]
)
doc = pipeline(text)
if doc.results("wowool_entity_graph"):
    print(json.dumps(doc.results("wowool_entity_graph"), indent=2))
else:
    print_diagnostics(doc)

Entity Graph

This script demonstrates how to use the Wowool SDK to extract entities and build an entity graph from English text.

from wowool.sdk import Pipeline
from wowool.entity_graph import EntityGraph
from wowool.utility.diagnostics import print_diagnostics
import json

text = "John Smith works for Ikea, he visited Jysk in Sweden. Bella Johansson is also working for Jysk."
pipeline = Pipeline("english,entity")
# defines a relationship: from "Person" to "Company" with the relation "VP".
grapher = EntityGraph(
    links=[
        {
            "from": "Person",
            "to": "Company",
            "relation": "VP",
        }
    ]
)
doc = pipeline(text)
doc = grapher(doc)

if doc.entity_graph:
    for link in doc.entity_graph:
        print(f"Link: {link.from_} -> ({link.relation}) ->  {link.to}")
    # print(json.dumps(doc.entity_graph, indent=2))
else:
    print_diagnostics(doc)

License

In both cases you will need to acquirer a license file at https://www.wowool.com

Non-Commercial

This library is licensed under the GNU AGPLv3 for non-commercial use.  
For commercial use, a separate license must be purchased.  

Commercial license Terms

1. Grants the right to use this library in proprietary software.  
2. Requires a valid license key  
3. Redistribution in SaaS requires a commercial license.  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wowool_entity_graph-3.1.5-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file wowool_entity_graph-3.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for wowool_entity_graph-3.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ab87956d9d7423953d11823fb79dca64b398d4c80310358b62aa39467485342f
MD5 ca5fa02dbd91731a7d4809219306c9c8
BLAKE2b-256 4dca57a14f7ee57f04e1f7c9711aff86345397e8a2fd1ab303079ed3d37d2704

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page