ORCA: Python wrapper for efficient graphlet counting
Project description
ORCA: Python Wrapper for Efficient Graphlet Counting
A Python implementation of the ORCA (ORbit Counting Algorithm) for efficient counting of graphlet orbits in networks. This is a pure Python port of the original C++ implementation by thocevar/orca.
What is ORCA?
ORCA is an efficient algorithm for counting graphlets in networks. It computes node-orbits and edge-orbits for 4-node and 5-node graphlets for each node in the network. Graphlets are small connected subgraphs that serve as fundamental building blocks for network analysis.
Features
- Pure Python implementation - No external C++ dependencies required
- Node and edge orbit counting - Count orbits for both nodes and edges
- 4-node and 5-node graphlets - Support for graphlets of size 4 and 5
- NumPy integration - Efficient array-based operations
- Validated accuracy - Test suite ensures outputs match the original C++ implementation
- Easy to use - Simple API with sensible defaults
Installation
pip install orca-graphlets
Quick Start
import numpy as np
from orca import orca_nodes, orca_edges
# Define a simple graph as an edge list
edges = np.array([
[0, 1],
[1, 2],
[2, 3],
[3, 0],
[1, 3]
])
# Count node orbits for 4-node graphlets
node_orbits = orca_nodes(edges, graphlet_size=4)
print("Node orbits shape:", node_orbits.shape)
# Count edge orbits for 4-node graphlets
edge_orbits = orca_edges(edges, graphlet_size=4)
print("Edge orbits shape:", edge_orbits.shape)
API Reference
orca_nodes(edge_list, num_nodes=None, graphlet_size=4, debug=False)
Count node orbits for each node in the graph.
Parameters:
edge_list(np.ndarray): Array of shape (E, 2) containing edges as pairs of node indicesnum_nodes(int, optional): Number of nodes in the graph. If None, inferred from edge_listgraphlet_size(int): Size of graphlets to count (4 or 5)debug(bool): Enable debug output
Returns:
np.ndarray: Array of shape (N, K) where N is the number of nodes and K is the number of orbit types for the given graphlet size
orca_edges(edge_list, num_nodes=None, graphlet_size=4, debug=False)
Count edge orbits for each edge in the graph.
Parameters:
- Same as
orca_nodes
Returns:
np.ndarray: Array of shape (E, K) where E is the number of edges and K is the number of orbit types for the given graphlet size
Graphlet Orbits
4-node graphlets
- Node orbits: 15 different orbit types (0-14)
- Edge orbits: 11 different orbit types (0-10)
5-node graphlets
- Node orbits: 73 different orbit types (0-72)
- Edge orbits: 58 different orbit types (0-57)
Input Format
The edge list should be a NumPy array where:
- Each row represents an undirected edge
- Columns contain the node indices (0-based)
- Node indices should be integers from 0 to N-1 where N is the number of nodes
Example:
# Triangle graph: nodes 0, 1, 2 fully connected
edges = np.array([
[0, 1],
[1, 2],
[2, 0]
])
Examples
Basic Usage
import numpy as np
from orca import orca_nodes, orca_edges
# Create a small graph
edges = np.array([
[0, 1],
[1, 2],
[2, 3],
[0, 3]
])
# Count 4-node graphlet orbits for nodes
node_counts = orca_nodes(edges, graphlet_size=4)
print(f"Node 0 orbit counts: {node_counts[0]}")
# Count 5-node graphlet orbits for edges
edge_counts = orca_edges(edges, graphlet_size=5)
print(f"Edge (0,1) orbit counts: {edge_counts[0]}")
Loading from File
import numpy as np
from orca import orca_nodes
# Load graph from file (format: first line = "num_nodes num_edges",
# following lines = "node1 node2")
def load_graph(filename):
with open(filename, 'r') as f:
lines = f.readlines()
num_nodes, num_edges = map(int, lines[0].strip().split())
edges = np.array([
list(map(int, line.strip().split()))
for line in lines[1:]
])
return edges, num_nodes
edges, num_nodes = load_graph('graph.in')
orbits = orca_nodes(edges, num_nodes=num_nodes, graphlet_size=4)
Performance
This pure Python implementation prioritizes:
- Correctness: Exact same results as the original C++ version
- Ease of use: No compilation or external dependencies required
- Maintainability: Clean, readable Python code
For maximum performance on very large graphs, consider using the original C++ implementation.
Testing
The package includes comprehensive tests that verify the outputs match the original C++ implementation:
pytest tests/
Original Work
This is a Python port of the original ORCA algorithm:
- Original repository: thocevar/orca
- Algorithm paper: Hočevar, T., & Demšar, J. (2014). A combinatorial approach to graphlet counting. Bioinformatics, 30(4), 559-565.
License
This project is licensed under the MIT License - see the LICENSE file for details.
The original ORCA algorithm is licensed under GPL-3.0.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Citation
If you use this software in your research, please cite the original paper:
@article{hocevar2014combinatorial,
title={A combinatorial approach to graphlet counting},
author={Ho{\v{c}}evar, Toma{\v{z}} and Dem{\v{s}}ar, Janez},
journal={Bioinformatics},
volume={30},
number={4},
pages={559--565},
year={2014},
publisher={Oxford University Press}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orca_graphlets-0.1.4.tar.gz.
File metadata
- Download URL: orca_graphlets-0.1.4.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f47d553184ae7bed2b117c6a860fe5de585ea45cc245bc490e29710c89e43e70
|
|
| MD5 |
6c0848fc92c2b4697204284342de7d39
|
|
| BLAKE2b-256 |
1053bc0f611d31d75ac2334a7fa107cdd46232e6bcd2a240e7d274f63b49a6f4
|
File details
Details for the file orca_graphlets-0.1.4-py3-none-any.whl.
File metadata
- Download URL: orca_graphlets-0.1.4-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd411bd04400167e446ce450d77f71a29690f73a45a859af5097c48094700296
|
|
| MD5 |
23c40bbcb3bb95ee8dc3436fb3090a3e
|
|
| BLAKE2b-256 |
ac9e65643942fd0bc68e1c526c759735b62c9263a3ecdeb8adcd1a1c45ff6e3f
|