The generalized graph input line entry system (GGILES) package produces sequential, string representations of graphs in a NetworkX format via depth-first graph traversal, and reproduces NetworkX graphs from the string representation. GGILES is flexible and customizable, accommodating a large range of graph types and formats.
Project description
Generalized Graph Input Line Entry System (GGILES) by 
Overview
The generalized graph input line entry system (GGILES) package produces sequential, string representations of graphs in a NetworkX format via depth-first graph traversal, and reproduces NetworkX graphs from the string representation. GGILES is flexible and customizable, accommodating a large range of graph representations.
Features
GGILES features graph sequentialization and subsequent graph reconstruction for:
- Graphs with node types and (optionally) edge types.
- Directed and undirected graphs.
- Graphs with node and edge attributes as float or string
GGILES also features:
- Default edge types: Default edges are not included in the string sequentialization for token efficiency and inferred in graph reconstruction.
- Attribute serialization with or without attribute name (keyword attribute).
- Optional or enforced attributes.
- Generating unique sequences: Graph invariant calculation for sequence canonicalization.
- A tokenizer for the sequence based on a regular expression.
All code examples below can also be found and run in the example notebook.
Table of contents
- Overview
- Features
- Table of contents
- Installation
- Quickstart: Generating a string representation from a graph
- Including the node and edge attributes in the sequence
- Unique sequences for graphs
- Configuring node- and edge types
- Tokenization
- References
- Copyright and license
- Contact
Installation
Install GGILES from pip:
pip install ggiles
or from GitHub:
pip install git+https://github.com/process-intelligence-research/Generalized-graph-line-entry-system
Alternatively, get the latest updates by cloning the repo and installing the editable version of the package with:
git clone https://github.com/process-intelligence-research/Generalized-graph-line-entry-system
cd Generalized-graph-line-entry-system
pip install .
Quickstart: Generating a string representation from a graph
Here is an example graph.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_node(0, type="a", id="1")
G.add_node(1, type="c", id="2")
G.add_node(2, type="b", id="3")
G.add_node(3, type="d", id="4")
G.add_node(4, type="b", id="5")
G.add_node(5, type="c", id="6")
G.add_node(6, type="a", id="7")
G.add_node(7, type="e", id="8")
G.add_node(8, type="a", id="9")
G.add_edge(0, 1, type="a-c", flt=0.1)
G.add_edge(1, 2, type="c-b", flt=1.2)
G.add_edge(2, 3, type="b-d", flt=2.3)
G.add_edge(3, 4, type="d-b", flt=3.4)
G.add_edge(4, 1, type="b-c", flt=4.1)
G.add_edge(3, 5, type="d-c", flt=3.5)
G.add_edge(6, 4, type="a-b", flt=4.6)
G.add_edge(4, 7, type="b-e", flt=5.7)
# Plot the graph
pos = nx.kamada_kawai_layout(G)
edge_labels = nx.get_edge_attributes(G, "type")
node_labels = nx.get_node_attributes(G, "type")
plt.figure(figsize=(10, 8))
nx.draw(G, pos, with_labels=True, labels=node_labels, node_color='cyan', node_size=500, font_size=10, font_weight='bold')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color='green', font_weight='bold')
plt.title("Example graph")
plt.show()
We want to create a string representation of this graph, where the type attribute of nodes and edges is used as a token.
from ggiles.graph_invariant import GraphInvariantProcessor, MorganAlgorithm, AlphabeticNodeOrdering
from ggiles import make_converter
invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
Result
v(a)e(a-b)v(b)<1[e(b-e)v(e)]e(b-c)v(c)<&|v(a)e(a-c)&|e(c-b)v(b)e(b-d)v(d)e(d-b)1e(d-c)v(c)|nv(a)
Here, v(.) denotes a node (vertex), e(.) an edge, [.] a branch, <&|.&.| a converging branch, 1.<1 a cycle, and |n a new subgraph.
Can we reconstruct the graph (with the information encoded in the sequence)?
reconstructed_graph = graph_converter.sequence_to_graph(sequence)
print(nx.is_isomorphic(G, reconstructed_graph,
node_match=lambda x, y: x["type"] == y["type"],
edge_match=lambda x, y: x["type"] == y["type"])
)
Using directed graphs works similarly by changing a flag in the converter constructor
graph_converter = make_converter(invariant=invariant, is_directed=False)
Including the node and edge attributes in the sequence
Attributes can be assigned to both nodes and edges. Attributes can be assinged without attribute key, or keyword-attributes with attribute key. To specify which attributes should be included in the sequence representation, add the attribute names to node_attributes, node_kwarg_attributes, edge_attributes, edge_kwarg_attributes. Any attribute in the graph can either be of type string (to represent categorical values) or of type float/int (to represent numerical values).
from ggiles.attribute_encoding import AttributeConfig
invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
attr_config = AttributeConfig(
node_attributes=["id"],
edge_kwarg_attributes=["flt"]
)
graph_converter = make_converter(invariant=invariant, attribute_config=attr_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
Result: v(a){"7"}e(a-b){"flt":4.6}v(b){"5"}...
Note how attributes are included after the respective token in {.}. If multiple attributes are included, they are separated by ,.
Optional attributes
If specified, attributes are required in nodes/edges. If attributes should be treated as optional, set the treat_attributes_as_optional in converter construction to True.
from ggiles import AttributeConfig
invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
attr_config = AttributeConfig(
node_attributes=["id"],
)
graph_converter = make_converter(invariant=invariant,
attribute_config=attr_config,
treat_attributes_as_optional=True)
G_opt = G.copy()
del G_opt.nodes[6]["id"]
sequence = graph_converter.graph_to_sequence(G_opt)
print(sequence)
Result: v(a)e(a-b)v(b){"5"}...
Notice how the node "b" (corresponding to the graph node with id 6) has no attribute. Not enabling treat_attributes_as_optional in this case raises an exception.
Keyword attributes are decoded back into the graph automatically. Attributes without keywords need to be assigned a keyword before being reassigned to a new graph. By default, the attribute keywords are assigned in the order specified in node_attributes and edge_attributes in AttributeConfig.
Unique sequences for graphs
GGILES can produce unique graph sequences. For this, a function for calculating a graph invariant needs to be specified. The graph invariant assigns unique ranks to the nodes of a graph, so that branching decisions can be made with respect to these ranks. The invariant is calculated by the GraphInvariantProcessor class. There are many ways to assign ranks to nodes, and some of them cannot always produce unique ranks to all graphs. An effective choice for the calculation of invariants can depend on the properties of the graphs in a specific use-case. Therefore, the GraphInvariantProcessor can have several GraphInvariantLayers. Each layer calculates node ranks. If a layer does not produce unique ranks and nodes with tied ranks remain, the ranks of lower level layers are used to break the ties.
Lower ranks are chosen first as start nodes of branches, new subgraphs, or converging branches. This convention promotes nodes with lower ranks to lie on the main sequence, and nodes with higher ranks to be pushed to branches and converging branches.
The following layers are inbuilt
- MorganAlgorithm: Ranks nodes using the Morgan Algorithm.
- CategoricalAttributePriorization: Ranks nodes by attribute value priority.
- DescendantSizePrioritization: Ranks nodes by descendant subgraph size.
- AlphabeticNodeOrdering: Ranks nodes by alphabetical attribute order.
- CustomAttributePrioritization: Ranks nodes using a custom attribute function.
- DescendantRankSumOrdering: Ranks nodes by the sum of ranks in their descendant subgraphs.
- DisregardEqualDescendants: Assigns 'random' ranks to nodes with isomorphic descendant graphs.
- FastDisregardEqualDescendants: Faster version for branched, non-connected directed graphs.
In the example below, the ranks are calculated with the morgan algorithm. If tied nodes remain, the ties are broken by ordering nodes alphabetically on the value of the "type" attribute.
from ggiles.graph_invariant import GraphInvariantProcessor, MorganAlgorithm, AlphabeticNodeOrdering
invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
Graph invariants are complex and may be dependent on the specific structure of the graphs in question. Therefore, it makes sense to validate the graph invariant calculation against a set of example graphs to ensure that the ranking is consistent and unique across different graph structures. For this, use the processor's validate_against_example_graphs method:
example_graphs: list[nx.Graph] = [] # Add example graphs here
print(invariant.validate_against_example_graphs(example_graphs))
Configuring node- and edge types
Changing the token type attribute
By default, GGILES uses the "type" attribte in nodes and edges for the selection of the strings for the tokens. This can be changed in a custom GraphTokenConfig:
from ggiles import GraphTokenConfig
token_config = GraphTokenConfig(
node_type_attribute="id"
)
graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
Result: v(7)e(a-b)v(5)...
Optional type for edges
For graphs without explicit edge types, the edge type token does not have to be included in the sequence representation. In this case, set the edge_type_attribute in GraphTokenConfig to None:
from ggiles import GraphTokenConfig
token_config = GraphTokenConfig(
edge_type_attribute=None
)
graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
Result: v(a)v(b)<1[v(e)]v(c)<&|v(a)&|v(b)v(d)1v(c)|nv(a)
Notice how edge tokens are not included anymore.
Default type for edges
It is also possible to omit the edge tokens of a single type. This is useful if a certain edge type occurs very frequently and can be treated as a default. This can be achieved by setting the default_edge_type in GraphTokenConfig:
from ggiles import GraphTokenConfig
invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
token_config = GraphTokenConfig(
default_edge_type="a-b"
)
graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
Result: v(a)v(b)<1[e(b-e)v(e)]e(b-c)v(c)...
Now, all edges with type a-b are omitted and inferred when parsing back to a graph.
Tokenization
The graph string representations can be tokenized with the GGILES package with the default tokenization scheme. The GGILESTokenizer produces a Regex expression that identifies node tokens, edge tokens, string attributes and digits, and syntax markers. Make sure to apply the same settings to the tokenizer as for the converter.
from ggiles import GGILESTokenizer
invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
sequence = graph_converter.graph_to_sequence(G)
tokenizer = GGILESTokenizer()
tokens = tokenizer.tokenize(sequence)
print(tokens[0:2])
Result: ["v(a)", "e(a-b)"]
References
- Vogel, G., Hirtreiter, E., Schulze Balhorn, L., & Schweidtmann, A. M. (2023). SFILES 2.0: An extended text-based flowsheet representation. Optimization and Engineering, 24(4), 2911–2933. https://doi.org/10.1007/s11081-023-09798-9
- d’Anterroches, L. (2006). Process flow sheet generation and design through a group contribution approach [Dissertation]. Technical University of Denmark.
- Weininger, D. (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1), 31–36. https://doi.org/10.1021/ci00057a005
Copyright and license
GGILES is published under MIT license (see license file)
Copyright (C) 2025 Artur Schweidtmann Delft University of Technology.
Contact
📧 Contact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ggiles-1.1.0.tar.gz.
File metadata
- Download URL: ggiles-1.1.0.tar.gz
- Upload date:
- Size: 319.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3cbf8bccb72f6ff57a951903e0841c9e7a4037cfd309776935de0e14599ed03
|
|
| MD5 |
86cd54257098b0d10cfe8fc5eb3832fe
|
|
| BLAKE2b-256 |
a64aa2f44b1a5e86cdbee8c859833a804cc53e73cceb2399c3600bf77cd31995
|
File details
Details for the file ggiles-1.1.0-py3-none-any.whl.
File metadata
- Download URL: ggiles-1.1.0-py3-none-any.whl
- Upload date:
- Size: 46.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9681d274b229a3ed12813004fd7673cdb90fdeea7685f761686df97d871431c7
|
|
| MD5 |
6aff2d4a45bc863187aa90d46a2adbcd
|
|
| BLAKE2b-256 |
7ff5485d14c7118a98021e27b11fa5ce08d621fb3da344532b3b2ca841609aa9
|