Skip to main content

ORCA: Python wrapper for efficient graphlet counting

Project description

ORCA: Python Wrapper for Efficient Graphlet Counting

PyPI version Python 3.8+ License: MIT

A Python implementation of the ORCA (ORbit Counting Algorithm) for efficient counting of graphlet orbits in networks. This is a pure Python port of the original C++ implementation by thocevar/orca.

What is ORCA?

ORCA is an efficient algorithm for counting graphlets in networks. It computes node-orbits and edge-orbits for 4-node and 5-node graphlets for each node in the network. Graphlets are small connected subgraphs that serve as fundamental building blocks for network analysis.

Features

  • Pure Python implementation - No external C++ dependencies required
  • Node and edge orbit counting - Count orbits for both nodes and edges
  • 4-node and 5-node graphlets - Support for graphlets of size 4 and 5
  • NumPy integration - Efficient array-based operations
  • Validated accuracy - Test suite ensures outputs match the original C++ implementation
  • Easy to use - Simple API with sensible defaults

Installation

pip install orca-graphlets

Quick Start

import numpy as np
from orca import orca_nodes, orca_edges

# Define a simple graph as an edge list
edges = np.array([
    [0, 1],
    [1, 2],
    [2, 3],
    [3, 0],
    [1, 3]
])

# Count node orbits for 4-node graphlets
node_orbits = orca_nodes(edges, graphlet_size=4)
print("Node orbits shape:", node_orbits.shape)

# Count edge orbits for 4-node graphlets  
edge_orbits = orca_edges(edges, graphlet_size=4)
print("Edge orbits shape:", edge_orbits.shape)

API Reference

orca_nodes(edge_list, num_nodes=None, graphlet_size=4, debug=False)

Count node orbits for each node in the graph.

Parameters:

  • edge_list (np.ndarray): Array of shape (E, 2) containing edges as pairs of node indices
  • num_nodes (int, optional): Number of nodes in the graph. If None, inferred from edge_list
  • graphlet_size (int): Size of graphlets to count (4 or 5)
  • debug (bool): Enable debug output

Returns:

  • np.ndarray: Array of shape (N, K) where N is the number of nodes and K is the number of orbit types for the given graphlet size

orca_edges(edge_list, num_nodes=None, graphlet_size=4, debug=False)

Count edge orbits for each edge in the graph.

Parameters:

  • Same as orca_nodes

Returns:

  • np.ndarray: Array of shape (E, K) where E is the number of edges and K is the number of orbit types for the given graphlet size

Graphlet Orbits

4-node graphlets

  • Node orbits: 15 different orbit types (0-14)
  • Edge orbits: 11 different orbit types (0-10)

5-node graphlets

  • Node orbits: 73 different orbit types (0-72)
  • Edge orbits: 58 different orbit types (0-57)

Input Format

The edge list should be a NumPy array where:

  • Each row represents an undirected edge
  • Columns contain the node indices (0-based)
  • Node indices should be integers from 0 to N-1 where N is the number of nodes

Example:

# Triangle graph: nodes 0, 1, 2 fully connected
edges = np.array([
    [0, 1],
    [1, 2], 
    [2, 0]
])

Examples

Basic Usage

import numpy as np
from orca import orca_nodes, orca_edges

# Create a small graph
edges = np.array([
    [0, 1],
    [1, 2],
    [2, 3],
    [0, 3]
])

# Count 4-node graphlet orbits for nodes
node_counts = orca_nodes(edges, graphlet_size=4)
print(f"Node 0 orbit counts: {node_counts[0]}")

# Count 5-node graphlet orbits for edges
edge_counts = orca_edges(edges, graphlet_size=5)
print(f"Edge (0,1) orbit counts: {edge_counts[0]}")

Loading from File

import numpy as np
from orca import orca_nodes

# Load graph from file (format: first line = "num_nodes num_edges", 
# following lines = "node1 node2")
def load_graph(filename):
    with open(filename, 'r') as f:
        lines = f.readlines()
        num_nodes, num_edges = map(int, lines[0].strip().split())
        edges = np.array([
            list(map(int, line.strip().split())) 
            for line in lines[1:]
        ])
    return edges, num_nodes

edges, num_nodes = load_graph('graph.in')
orbits = orca_nodes(edges, num_nodes=num_nodes, graphlet_size=4)

Performance

This pure Python implementation prioritizes:

  • Correctness: Exact same results as the original C++ version
  • Ease of use: No compilation or external dependencies required
  • Maintainability: Clean, readable Python code

For maximum performance on very large graphs, consider using the original C++ implementation.

Testing

The package includes comprehensive tests that verify the outputs match the original C++ implementation:

pytest tests/

Original Work

This is a Python port of the original ORCA algorithm:

  • Original repository: thocevar/orca
  • Algorithm paper: Hočevar, T., & Demšar, J. (2014). A combinatorial approach to graphlet counting. Bioinformatics, 30(4), 559-565.

License

This project is licensed under the MIT License - see the LICENSE file for details.

The original ORCA algorithm is licensed under GPL-3.0.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

If you use this software in your research, please cite the original paper:

@article{hocevar2014combinatorial,
  title={A combinatorial approach to graphlet counting},
  author={Ho{\v{c}}evar, Toma{\v{z}} and Dem{\v{s}}ar, Janez},
  journal={Bioinformatics},
  volume={30},
  number={4},
  pages={559--565},
  year={2014},
  publisher={Oxford University Press}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orca_graphlets-0.1.4.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orca_graphlets-0.1.4-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file orca_graphlets-0.1.4.tar.gz.

File metadata

  • Download URL: orca_graphlets-0.1.4.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for orca_graphlets-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f47d553184ae7bed2b117c6a860fe5de585ea45cc245bc490e29710c89e43e70
MD5 6c0848fc92c2b4697204284342de7d39
BLAKE2b-256 1053bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4

See more details on using hashes here.

File details

Details for the file orca_graphlets-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for orca_graphlets-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bd411bd04400167e446ce450d77f71a29690f73a45a859af5097c48094700296
MD5 23c40bbcb3bb95ee8dc3436fb3090a3e
BLAKE2b-256 ac9e65643942fd0bc68e1c526c759735b62c9263a3ecdeb8adcd1a1c45ff6e3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page