Skip to main content

The generalized graph input line entry system (GGILES) package produces sequential, string representations of graphs in a NetworkX format via depth-first graph traversal, and reproduces NetworkX graphs from the string representation. GGILES is flexible and customizable, accommodating a large range of graph types and formats.

Project description

Generalized Graph Input Line Entry System (GGILES) by Process Intelligence Research logo

Overview

The generalized graph input line entry system (GGILES) package produces sequential, string representations of graphs in a NetworkX format via depth-first graph traversal, and reproduces NetworkX graphs from the string representation. GGILES is flexible and customizable, accommodating a large range of graph representations.

Visual Overview

Features

GGILES features graph sequentialization and subsequent graph reconstruction for:

  • Graphs with node types and (optionally) edge types.
  • Directed and undirected graphs.
  • Graphs with node and edge attributes as float or string

GGILES also features:

  • Default edge types: Default edges are not included in the string sequentialization for token efficiency and inferred in graph reconstruction.
  • Attribute serialization with or without attribute name (keyword attribute).
  • Optional or enforced attributes.
  • Generating unique sequences: Graph invariant calculation for sequence canonicalization.
  • A tokenizer for the sequence based on a regular expression.

All code examples below can also be found and run in the example notebook.

Table of contents

Installation

Install GGILES from pip:

pip install ggiles

or from GitHub:

pip install git+https://github.com/process-intelligence-research/Generalized-graph-line-entry-system

Alternatively, get the latest updates by cloning the repo and installing the editable version of the package with:

git clone https://github.com/process-intelligence-research/Generalized-graph-line-entry-system
cd Generalized-graph-line-entry-system
pip install .

Quickstart: Generating a string representation from a graph

Here is an example graph.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
G.add_node(0, type="a", id="1")
G.add_node(1, type="c", id="2")
G.add_node(2, type="b", id="3")
G.add_node(3, type="d", id="4")
G.add_node(4, type="b", id="5")
G.add_node(5, type="c", id="6")
G.add_node(6, type="a", id="7")
G.add_node(7, type="e", id="8")
G.add_node(8, type="a", id="9")
G.add_edge(0, 1, type="a-c", flt=0.1)
G.add_edge(1, 2, type="c-b", flt=1.2)
G.add_edge(2, 3, type="b-d", flt=2.3)
G.add_edge(3, 4, type="d-b", flt=3.4)
G.add_edge(4, 1, type="b-c", flt=4.1)
G.add_edge(3, 5, type="d-c", flt=3.5)
G.add_edge(6, 4, type="a-b", flt=4.6)
G.add_edge(4, 7, type="b-e", flt=5.7)

# Plot the graph
pos = nx.kamada_kawai_layout(G)  
edge_labels = nx.get_edge_attributes(G, "type")  
node_labels = nx.get_node_attributes(G, "type") 

plt.figure(figsize=(10, 8))
nx.draw(G, pos, with_labels=True, labels=node_labels, node_color='cyan', node_size=500, font_size=10, font_weight='bold')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color='green', font_weight='bold')

plt.title("Example graph")

plt.show()

We want to create a string representation of this graph, where the type attribute of nodes and edges is used as a token.

from ggiles.graph_invariant import GraphInvariantProcessor, MorganAlgorithm, AlphabeticNodeOrdering
from ggiles import make_converter

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)

Result

v(a)e(a-b)v(b)<1[e(b-e)v(e)]e(b-c)v(c)<&|v(a)e(a-c)&|e(c-b)v(b)e(b-d)v(d)e(d-b)1e(d-c)v(c)|nv(a)

Here, v(.) denotes a node (vertex), e(.) an edge, [.] a branch, <&|.&.| a converging branch, 1.<1 a cycle, and |n a new subgraph.

Can we reconstruct the graph (with the information encoded in the sequence)?

reconstructed_graph = graph_converter.sequence_to_graph(sequence)
print(nx.is_isomorphic(G, reconstructed_graph, 
        node_match=lambda x, y: x["type"] == y["type"], 
        edge_match=lambda x, y: x["type"] == y["type"])
     )

Using directed graphs works similarly by changing a flag in the converter constructor

graph_converter = make_converter(invariant=invariant, is_directed=False)

Including the node and edge attributes in the sequence

Attributes can be assigned to both nodes and edges. Attributes can be assinged without attribute key, or keyword-attributes with attribute key. To specify which attributes should be included in the sequence representation, add the attribute names to node_attributes, node_kwarg_attributes, edge_attributes, edge_kwarg_attributes. Any attribute in the graph can either be of type string (to represent categorical values) or of type float/int (to represent numerical values).

from ggiles.attribute_encoding import AttributeConfig

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
attr_config = AttributeConfig(
    node_attributes=["id"],
    edge_kwarg_attributes=["flt"]
    )

graph_converter = make_converter(invariant=invariant, attribute_config=attr_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)

Result: v(a){"7"}e(a-b){"flt":4.6}v(b){"5"}...

Note how attributes are included after the respective token in {.}. If multiple attributes are included, they are separated by ,.

Optional attributes

If specified, attributes are required in nodes/edges. If attributes should be treated as optional, set the treat_attributes_as_optional in converter construction to True.

from ggiles import AttributeConfig

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
attr_config = AttributeConfig(
    node_attributes=["id"],
    )

graph_converter = make_converter(invariant=invariant, 
                                  attribute_config=attr_config, 
                                  treat_attributes_as_optional=True)

G_opt = G.copy()

del G_opt.nodes[6]["id"]
sequence = graph_converter.graph_to_sequence(G_opt)
print(sequence)

Result: v(a)e(a-b)v(b){"5"}...

Notice how the node "b" (corresponding to the graph node with id 6) has no attribute. Not enabling treat_attributes_as_optional in this case raises an exception.

Keyword attributes are decoded back into the graph automatically. Attributes without keywords need to be assigned a keyword before being reassigned to a new graph. By default, the attribute keywords are assigned in the order specified in node_attributes and edge_attributes in AttributeConfig.

Unique sequences for graphs

GGILES can produce unique graph sequences. For this, a function for calculating a graph invariant needs to be specified. The graph invariant assigns unique ranks to the nodes of a graph, so that branching decisions can be made with respect to these ranks. The invariant is calculated by the GraphInvariantProcessor class. There are many ways to assign ranks to nodes, and some of them cannot always produce unique ranks to all graphs. An effective choice for the calculation of invariants can depend on the properties of the graphs in a specific use-case. Therefore, the GraphInvariantProcessor can have several GraphInvariantLayers. Each layer calculates node ranks. If a layer does not produce unique ranks and nodes with tied ranks remain, the ranks of lower level layers are used to break the ties.

Lower ranks are chosen first as start nodes of branches, new subgraphs, or converging branches. This convention promotes nodes with lower ranks to lie on the main sequence, and nodes with higher ranks to be pushed to branches and converging branches.

The following layers are inbuilt

  • MorganAlgorithm: Ranks nodes using the Morgan Algorithm.
  • CategoricalAttributePriorization: Ranks nodes by attribute value priority.
  • DescendantSizePrioritization: Ranks nodes by descendant subgraph size.
  • AlphabeticNodeOrdering: Ranks nodes by alphabetical attribute order.
  • CustomAttributePrioritization: Ranks nodes using a custom attribute function.
  • DescendantRankSumOrdering: Ranks nodes by the sum of ranks in their descendant subgraphs.
  • DisregardEqualDescendants: Assigns 'random' ranks to nodes with isomorphic descendant graphs.
  • FastDisregardEqualDescendants: Faster version for branched, non-connected directed graphs.

In the example below, the ranks are calculated with the morgan algorithm. If tied nodes remain, the ties are broken by ordering nodes alphabetically on the value of the "type" attribute.

from ggiles.graph_invariant import GraphInvariantProcessor, MorganAlgorithm, AlphabeticNodeOrdering

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)

Graph invariants are complex and may be dependent on the specific structure of the graphs in question. Therefore, it makes sense to validate the graph invariant calculation against a set of example graphs to ensure that the ranking is consistent and unique across different graph structures. For this, use the processor's validate_against_example_graphs method:

example_graphs: list[nx.Graph] = [] # Add example graphs here
print(invariant.validate_against_example_graphs(example_graphs))

Configuring node- and edge types

Changing the token type attribute

By default, GGILES uses the "type" attribte in nodes and edges for the selection of the strings for the tokens. This can be changed in a custom GraphTokenConfig:

from ggiles import GraphTokenConfig

token_config = GraphTokenConfig(
    node_type_attribute="id"
)

graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)

Result: v(7)e(a-b)v(5)...

Optional type for edges

For graphs without explicit edge types, the edge type token does not have to be included in the sequence representation. In this case, set the edge_type_attribute in GraphTokenConfig to None:

from ggiles import GraphTokenConfig

token_config = GraphTokenConfig(
    edge_type_attribute=None
)

graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)

Result: v(a)v(b)<1[v(e)]v(c)<&|v(a)&|v(b)v(d)1v(c)|nv(a)

Notice how edge tokens are not included anymore.

Default type for edges

It is also possible to omit the edge tokens of a single type. This is useful if a certain edge type occurs very frequently and can be treated as a default. This can be achieved by setting the default_edge_type in GraphTokenConfig:

from ggiles import GraphTokenConfig

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
token_config = GraphTokenConfig(
    default_edge_type="a-b"
)
graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)

Result: v(a)v(b)<1[e(b-e)v(e)]e(b-c)v(c)...

Now, all edges with type a-b are omitted and inferred when parsing back to a graph.

Tokenization

The graph string representations can be tokenized with the GGILES package with the default tokenization scheme. The GGILESTokenizer produces a Regex expression that identifies node tokens, edge tokens, string attributes and digits, and syntax markers. Make sure to apply the same settings to the tokenizer as for the converter.

from ggiles import GGILESTokenizer

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
sequence = graph_converter.graph_to_sequence(G)


tokenizer = GGILESTokenizer()
tokens = tokenizer.tokenize(sequence)
print(tokens[0:2])

Result: ["v(a)", "e(a-b)"]

References

  • Vogel, G., Hirtreiter, E., Schulze Balhorn, L., & Schweidtmann, A. M. (2023). SFILES 2.0: An extended text-based flowsheet representation. Optimization and Engineering, 24(4), 2911–2933. https://doi.org/10.1007/s11081-023-09798-9
  • d’Anterroches, L. (2006). Process flow sheet generation and design through a group contribution approach [Dissertation]. Technical University of Denmark.
  • Weininger, D. (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1), 31–36. https://doi.org/10.1021/ci00057a005

Copyright and license

GGILES is published under MIT license (see license file)

Copyright (C) 2025 Artur Schweidtmann Delft University of Technology.

Contact

📧 Contact

🌐 PI research

fernandezbap

https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ggiles-1.1.0.tar.gz (319.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ggiles-1.1.0-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file ggiles-1.1.0.tar.gz.

File metadata

  • Download URL: ggiles-1.1.0.tar.gz
  • Upload date:
  • Size: 319.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for ggiles-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f3cbf8bccb72f6ff57a951903e0841c9e7a4037cfd309776935de0e14599ed03
MD5 86cd54257098b0d10cfe8fc5eb3832fe
BLAKE2b-256 a64aa2f44b1a5e86cdbee8c859833a804cc53e73cceb2399c3600bf77cd31995

See more details on using hashes here.

File details

Details for the file ggiles-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: ggiles-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 46.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for ggiles-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9681d274b229a3ed12813004fd7673cdb90fdeea7685f761686df97d871431c7
MD5 6aff2d4a45bc863187aa90d46a2adbcd
BLAKE2b-256 7ff5485d14c7118a98021e27b11fa5ce08d621fb3da344532b3b2ca841609aa9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page