Skip to main content

COGILES - Colored Graph Input Line Entry System: a SMILES-inspired string notation for colored graphs

Project description

COGILES — Colored Graph Input Line Entry System

What is COGILES?

COGILES is a compact, human-readable string notation for defining colored graphs. It is strongly inspired by SMILES (Simplified Molecular Input Line Entry System) used in chemistry for representing molecular structures, but adapted for arbitrary colored graphs rather than molecules.

A COGILES string encodes both the structure (nodes and edges) and the visual attributes (node colors) of a graph in a single line of text.

Origin

COGILES was originally developed as part of the visual_graph_datasets library (introduced in version 0.13.0). The goal of this project is to extract it into a standalone library that can be imported and used independently by other programs.

Syntax

Node Types

The set of node types is fixed and opinionated — like SMILES, COGILES defines a specific alphabet and users take it or leave it.

Following the SMILES convention, common colors use a single uppercase letter. Colors whose first letter collides with another use a two-letter code (uppercase + lowercase), similar to how SMILES distinguishes elements like B (boron) from Br (bromine).

Letter Color RGB
R Red (1.0, 0.0, 0.0)
G Green (0.0, 0.5, 0.0)
B Blue (0.0, 0.0, 1.0)
Y Yellow (1.0, 1.0, 0.0)
C Cyan (0.0, 1.0, 1.0)
M Magenta (1.0, 0.0, 1.0)
W White (1.0, 1.0, 1.0)
K Black (0.0, 0.0, 0.0)
O Orange (1.0, 0.5, 0.0)
P Purple (0.5, 0.0, 0.5)
T Teal (0.0, 0.5, 0.5)
L Lime (0.5, 1.0, 0.0)
I Indigo (0.3, 0.0, 0.5)
Bl Black (0.0, 0.0, 0.0)
Br Brown (0.6, 0.3, 0.0)
Gr Gray (0.5, 0.5, 0.5)
Pi Pink (1.0, 0.4, 0.7)
Ol Olive (0.5, 0.5, 0.0)

Note: K and Bl are both valid for Black (K follows the CMYK convention).

Sequential Connection

Nodes placed next to each other in the string are automatically connected by an edge.

  • RRRGGG — a chain of 6 nodes: 3 red followed by 3 green

Branches

Parentheses () create branches. The first node inside a branch connects to the node immediately before the opening parenthesis. Inside a branch, normal sequential rules apply.

  • RRR(BB)RRRR — a main chain of 7 red nodes with a side branch of 2 blue nodes attached to the 3rd red node
  • Y(G)(G)(G) — a star: yellow center with 3 green leaves
  • BB(GG)CC — 2 blue nodes, then a branch of 2 green nodes and a continuation of 2 cyan nodes

Anchors

Anchors are written as -N (dash followed by an integer) and placed after a node. All nodes sharing the same anchor number are connected, enabling cycles and other non-tree structures.

  • R-1RRRRR-1 — a cycle of 6 red nodes (first and last connected via anchor 1)
  • R-1-2GGG-2BBB-1 — first node has two anchors, creating connections to distant nodes

Rules:

  • A single anchor with a given number has no effect; at least two nodes must share the same anchor number.
  • When an anchor number appears more than twice, all subsequent occurrences connect to the node where the anchor first appeared.
  • A single node can have multiple anchors (e.g., R-1-2).

Breaks (Disconnected Components)

A period . between two nodes prevents the default sequential connection, allowing the definition of multiple disconnected graphs in one string.

  • R-1RR-1.RR(GG)R — two separate graph components: a triangle and a branching structure

Grammar (PEG)

The COGILES syntax is formally defined as a PEG grammar (parsed with the parsimonious library):

graph           = (branch / node / anchor / break)*

branch_node     = (break_node / node) branch+
anchor_node     = (branch_node / break_node / node) anchor+
break_node      = break node

branch          = lpar graph rpar
lpar            = "("
rpar            = ")"

break           = "."
anchor          = ~r"-[\d]+"
node            = ~r"(Bl|Br|Gr|Pi|Ol|[RGBYKCMWOPTLI])"

Graph Representation

Parsing a COGILES string produces a NetworkX graph (nx.Graph). Each node has a color attribute storing its RGB tuple. Edges are unweighted — no edge attributes are stored.

import cogiles

G = cogiles.parse("R-1RRRRR-1")
# G is a nx.Graph with 6 nodes
# G.nodes[0]["color"] == (1.0, 0.0, 0.0)
# G has 6 edges forming a cycle

Encoding converts a NetworkX graph back into a canonical COGILES string:

s = cogiles.encode(G)
# s == "R-1RRRRR-1"  (deterministic canonical form)

Planned Library Scope

The standalone cogiles library should provide:

  1. Parsingcogiles.parse(string) -> nx.Graph — decode a COGILES string into a NetworkX graph
  2. Encodingcogiles.encode(graph) -> string — encode a NetworkX graph back into a canonical COGILES string (deterministic, starting from the lowest-index node)
  3. Node types — a fixed, opinionated mapping between letters, color names, and RGB tuples (to be finalized)
  4. Validation — a custom CogilesParseError exception with descriptive messages and position info for malformed strings

Design Decisions

Decision Choice
Package name cogiles (import cogiles)
Graph representation NetworkX (nx.Graph)
Parser library parsimonious (PEG grammar)
Node types Fixed set of 17 colors (13 single-letter + 4 two-letter + 1 alias), not extensible
Edge attributes Not included — edges are unweighted
Error handling Custom CogilesParseError with position info
Testing pytest + nox
Packaging pyproject.toml only (PEP 621)

Dependencies

  • networkx — for graph representation
  • parsimonious — for PEG grammar parsing

Dev Dependencies

  • pytest — test framework
  • nox — test runner / task automation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cogiles-0.2.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cogiles-0.2.0-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file cogiles-0.2.0.tar.gz.

File metadata

  • Download URL: cogiles-0.2.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for cogiles-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b6043e1da517471658cd71bb19a5bfd4c9db1213cacd2a1c013fe5f3cc168965
MD5 d17d05519388984c46bd1f9a329375f4
BLAKE2b-256 f36f07f85b4d17fbab48fa031552111b728bb83da8bf31c77c5b71837f58ab6c

See more details on using hashes here.

File details

Details for the file cogiles-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cogiles-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for cogiles-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79374b006209702b9c7369412ef8c025f1f1ece5f0c71435bc3833b86876161f
MD5 cb59f389a15cd2b75eeeaa841d2270e4
BLAKE2b-256 45c89835ce342ffa3c9073457be7e4a177a268a957d4df5e1f1cc55957312a75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page