Skip to main content

A high-performance C++ implementation of disjoint-set data structure with Python bindings

Project description

DisjointForest

A C++ implementation of a disjoint-set data structure (also known as union-find) using the disjoint forest representation with path compression and union by rank optimizations, with Python bindings included as well. This is intended to fill a void in too many existing standard libraries, which don't include a proper implementation of this despite its utility in many applications.

Features

  • Template-based: Works with any data type
  • Path Compression: Automatically compresses paths during find operations
  • Union by Rank: Optimizes union operations by maintaining tree depth information
  • Dynamic Operations: Expand capacity, contract nodes, and clear the forest
  • Memory Efficient: STL vectors with automatic memory management
  • Smart Pointers: Uses std::unique_ptr for automatic memory cleanup
  • Exception Safety: Proper error handling with std::invalid_argument and std::runtime_error
  • Comprehensive Testing: Full unit test suite using Google Test
  • Python Bindings: Full Python interface with pybind11

Class Structure

Node

  • data: The actual data stored in the node
  • parent: Pointer to the parent node (self-referential for root nodes)
  • rank: Depth of the subtree rooted at this node

DisjointForest

  • makeSet(T data): Creates a new set containing the given data
  • find(Node<T>* node): Finds the representative of the set containing the node
  • unionSets(Node<T>* node1, Node<T>* node2): Merges two sets
  • Constructor takes the maximum number of elements as parameter

Dynamic Operations

  • expand(int additional_capacity): Increases the forest's capacity by the specified amount
  • contract(Node<T>* node): Removes a node from the forest, updating parent references
  • clear(): Removes all nodes and resets the forest to empty state
  • size(): Returns the current number of nodes in the forest
  • capacity(): Returns the total capacity allocated for the forest
  • isEmpty(): Returns true if the forest contains no nodes
  • getAllNodes(): Returns a vector of all current nodes in the forest

Building the Project

Prerequisites

  • CMake 3.10 or higher
  • C++17 compatible compiler
  • Google Test library
  • Python 3.6+ (for Python bindings)
  • pybind11 (for Python bindings)

Install Google Test

Ubuntu/Debian

sudo apt-get install libgtest-dev

macOS

brew install googletest

CentOS/RHEL/Fedora

sudo yum install gtest-devel
# or for newer versions
sudo dnf install gtest-devel

Arch Linux

sudo pacman -S gtest

Build Steps

Using CMake

# Create build directory
mkdir build
cd build

# Configure and build
cmake ..
make

# Run tests
make test
# or directly
./disjoint_forest_test

# Run example
./disjoint_forest_example

Using Makefile (Alternative)

# Build everything including Python bindings
make all

# Build only C++ components
make cpp

# Build Python bindings
make python-bindings

# Run all tests
make test

# Run Python tests
make python-test

# Run examples
make example
make python-example

# Clean build artifacts
make clean

Running Tests

The test suite covers:

  1. Basic Operations: Constructor, destructor, makeSet
  2. Find Operations: Single node find, path compression
  3. Union Operations: Basic union, union by rank, same set union
  4. Dynamic Operations: Expand capacity, contract nodes, clear forest, size/capacity queries
  5. Complex Scenarios: Multiple unions, large datasets, dynamic capacity changes
  6. Edge Cases: Empty forests, single element forests, contraction edge cases
  7. Memory Management: Large operations, proper cleanup, dynamic memory allocation
  8. Template Support: Tests with different data types (int, string)

Example Usage

Basic Operations

#include "disjoint_forest.h"

// Create a forest with capacity for 100 elements
DisjointForest<int> forest(100);

// Create some sets
Node<int>* node1 = forest.makeSet(1);
Node<int>* node2 = forest.makeSet(2);
Node<int>* node3 = forest.makeSet(3);

// Union sets
forest.unionSets(node1, node2);
forest.unionSets(node1, node3);

// Find representatives
Node<int>* rep1 = forest.find(node1);
Node<int>* rep2 = forest.find(node2);
Node<int>* rep3 = forest.find(node3);

// All should have the same representative
assert(rep1 == rep2 && rep2 == rep3);

Dynamic Operations

// Expand capacity when needed
forest.expand(50);  // Add capacity for 50 more elements

// Check current state
std::cout << "Size: " << forest.size() << std::endl;
std::cout << "Capacity: " << forest.capacity() << std::endl;
std::cout << "Is empty: " << (forest.isEmpty() ? "Yes" : "No") << std::endl;

// Get all nodes for iteration
auto all_nodes = forest.getAllNodes();
for (auto* node : all_nodes) {
    std::cout << "Node data: " << node->data << std::endl;
}

// Contract (remove) a node
forest.contract(node2);

// Clear the entire forest
forest.clear();

Algorithm Complexity

  • makeSet: O(1)
  • find: O(α(n)) amortized, where α is the inverse Ackermann function
  • unionSets: O(α(n)) amortized
  • expand: O(1) amortized (vector doubling strategy)
  • contract: O(1) (marking nodes as invalid)
  • clear: O(n) where n is the number of nodes
  • size/capacity/isEmpty: O(1)
  • getAllNodes: O(n) where n is the number of nodes
  • Space: O(n) where n is the number of elements

Use Cases

Disjoint-set data structures are fundamental to many algorithms and applications:

  • Graph Algorithms: Connected components, minimum spanning trees (Kruskal's algorithm)
  • Image Processing: Connected component labeling, flood fill algorithms
  • Network Analysis: Community detection, clustering algorithms
  • Game Development: Pathfinding, collision detection systems
  • Database Systems: Transaction management, dependency resolution
  • Compiler Design: Register allocation, live variable analysis

Python Bindings

The project includes Python bindings that provide a unified interface for working with any Python object type. The bindings are built using pybind11 and offer the same functionality as the C++ implementation.

Features

  • Type Flexibility: Works with integers, strings, lists, dictionaries, custom objects, etc.
  • Unified API: Single DisjointForest class that handles any Python object type
  • Safe Operations: Built-in safety checks and error handling
  • Memory Management: Automatic cleanup of Python references

Installation

# Install Python dependencies
cd python
pip3 install -r requirements.txt

# Build the Python module
make build

# Test the bindings
make test

# Run the example
make example

Python Usage

import disjoint_forest

# Create a forest
forest = disjoint_forest.DisjointForest(10)

# Create sets with different types
node1 = forest.make_set(42)
node2 = forest.make_set("hello")
node3 = forest.make_set([1, 2, 3])

# Union operations
forest.union_sets(node1, node2)

# Find representatives
rep = forest.find(node1)
print(f"Representative: {rep.data}")

# Dynamic operations
forest.expand(20)
print(f"Capacity: {forest.capacity()}")
print(f"Size: {forest.size()}")

# Get all nodes
all_nodes = forest.get_all_nodes()
for node in all_nodes:
    print(f"Node: {node.data}")

Implementation Details

STL Vector Usage

The implementation uses std::vector instead of fixed-size arrays, providing several advantages:

  • Dynamic Capacity: The forest can grow beyond its initial capacity using the expand() method
  • Automatic Memory Management: Vectors handle memory allocation and deallocation automatically
  • Efficient Resizing: STL vectors use exponential growth strategies for optimal performance
  • Exception Safety: Vector operations provide strong exception guarantees

Memory Management

  • Smart Pointers: std::unique_ptr ensures automatic cleanup of Node objects
  • RAII Principles: Resource acquisition is tied to object lifetime
  • Exception Safety: Memory is properly cleaned up even when exceptions occur
  • Python Integration: Python bindings include reference counting and safe memory management

Notes

  • The implementation automatically handles path compression during find operations
  • Union by rank ensures balanced trees for optimal performance
  • Memory is automatically managed with proper cleanup in the destructor
  • Python bindings provide safe access to all C++ functionality
  • Dynamic operations maintain the forest's integrity while allowing flexible capacity management

Troubleshooting

Common Build Issues

Google Test Not Found

If you encounter linking errors with Google Test:

# The Makefile will automatically install Google Test if needed
make install-deps

# Or manually install for your system (see Prerequisites section)

Python Import Errors

If Python can't import the module:

# Ensure the module is built
cd python
make build

# Check if the shared library exists
ls -la *.so

# Verify Python path
python3 -c "import sys; print(sys.path)"

Memory Issues

The implementation uses smart pointers and RAII principles. If you encounter memory issues:

  • Ensure you're not manually deleting Node pointers
  • Use the provided methods (clear(), contract()) instead of manual memory management
  • Check that Python bindings are properly handling object lifetimes

Contributing

We welcome contributions to improve the implementation, test coverage, or documentation. Please:

  1. Fork the repository and create a feature branch
  2. Follow the existing code style and naming conventions
  3. Add tests for new functionality
  4. Update documentation to reflect any changes
  5. Submit a pull request with a clear description of your changes

Development Setup

# Clone and setup
git clone <your-fork-url>
cd disjoint_set

# Install development dependencies
make install-deps

# Run all tests to ensure everything works
make test
make python-test

# Build everything
make all

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disjoint_forest-1.1.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

disjoint_forest-1.1-cp310-cp310-manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10

File details

Details for the file disjoint_forest-1.1.tar.gz.

File metadata

  • Download URL: disjoint_forest-1.1.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for disjoint_forest-1.1.tar.gz
Algorithm Hash digest
SHA256 5e60077c74484c23c8ed48d0a0e18824c5f73c80f2fc5d9e1ddd2e6483389486
MD5 81160032c4a10451ed7a438d24b92707
BLAKE2b-256 a88202995f6d78bd7ae80b711d5761b1824046341c6516282f3208b85aa70992

See more details on using hashes here.

File details

Details for the file disjoint_forest-1.1-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for disjoint_forest-1.1-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b1c7f01eb6695efd0a66e8707dee3d82c9e1446880090100726985ea040f53b3
MD5 9b06dd700d762404c2a7e5807f7717f2
BLAKE2b-256 bb65bfd861bf6567c05204ae1a48437851608d7fd8be5ccddf5ebf0c045ca800

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page