A high-performance C++ implementation of disjoint-set data structure with Python bindings
Project description
DisjointForest
A C++ implementation of a disjoint-set data structure (also known as union-find) using the disjoint forest representation with path compression and union by rank optimizations, with Python bindings included as well. This is intended to fill a void in too many existing standard libraries, which don't include a proper implementation of this despite its utility in many applications.
Features
- Template-based: Works with any data type
- Path Compression: Automatically compresses paths during find operations
- Union by Rank: Optimizes union operations by maintaining tree depth information
- Dynamic Operations: Expand capacity, contract nodes, and clear the forest
- Memory Efficient: STL vectors with automatic memory management
- Smart Pointers: Uses
std::unique_ptrfor automatic memory cleanup - Exception Safety: Proper error handling with
std::invalid_argumentandstd::runtime_error - Comprehensive Testing: Full unit test suite using Google Test
- Python Bindings: Full Python interface with pybind11
Class Structure
Node
data: The actual data stored in the nodeparent: Pointer to the parent node (self-referential for root nodes)rank: Depth of the subtree rooted at this node
DisjointForest
makeSet(T data): Creates a new set containing the given datafind(Node<T>* node): Finds the representative of the set containing the nodeunionSets(Node<T>* node1, Node<T>* node2): Merges two sets- Constructor takes the maximum number of elements as parameter
Dynamic Operations
expand(int additional_capacity): Increases the forest's capacity by the specified amountcontract(Node<T>* node): Removes a node from the forest, updating parent referencesclear(): Removes all nodes and resets the forest to empty statesize(): Returns the current number of nodes in the forestcapacity(): Returns the total capacity allocated for the forestisEmpty(): Returns true if the forest contains no nodesgetAllNodes(): Returns a vector of all current nodes in the forest
Building the Project
Prerequisites
- CMake 3.10 or higher
- C++17 compatible compiler
- Google Test library
- Python 3.6+ (for Python bindings)
- pybind11 (for Python bindings)
Install Google Test
Ubuntu/Debian
sudo apt-get install libgtest-dev
macOS
brew install googletest
CentOS/RHEL/Fedora
sudo yum install gtest-devel
# or for newer versions
sudo dnf install gtest-devel
Arch Linux
sudo pacman -S gtest
Build Steps
Using CMake
# Create build directory
mkdir build
cd build
# Configure and build
cmake ..
make
# Run tests
make test
# or directly
./disjoint_forest_test
# Run example
./disjoint_forest_example
Using Makefile (Alternative)
# Build everything including Python bindings
make all
# Build only C++ components
make cpp
# Build Python bindings
make python-bindings
# Run all tests
make test
# Run Python tests
make python-test
# Run examples
make example
make python-example
# Clean build artifacts
make clean
Running Tests
The test suite covers:
- Basic Operations: Constructor, destructor, makeSet
- Find Operations: Single node find, path compression
- Union Operations: Basic union, union by rank, same set union
- Dynamic Operations: Expand capacity, contract nodes, clear forest, size/capacity queries
- Complex Scenarios: Multiple unions, large datasets, dynamic capacity changes
- Edge Cases: Empty forests, single element forests, contraction edge cases
- Memory Management: Large operations, proper cleanup, dynamic memory allocation
- Template Support: Tests with different data types (int, string)
Example Usage
Basic Operations
#include "disjoint_forest.h"
// Create a forest with capacity for 100 elements
DisjointForest<int> forest(100);
// Create some sets
Node<int>* node1 = forest.makeSet(1);
Node<int>* node2 = forest.makeSet(2);
Node<int>* node3 = forest.makeSet(3);
// Union sets
forest.unionSets(node1, node2);
forest.unionSets(node1, node3);
// Find representatives
Node<int>* rep1 = forest.find(node1);
Node<int>* rep2 = forest.find(node2);
Node<int>* rep3 = forest.find(node3);
// All should have the same representative
assert(rep1 == rep2 && rep2 == rep3);
Dynamic Operations
// Expand capacity when needed
forest.expand(50); // Add capacity for 50 more elements
// Check current state
std::cout << "Size: " << forest.size() << std::endl;
std::cout << "Capacity: " << forest.capacity() << std::endl;
std::cout << "Is empty: " << (forest.isEmpty() ? "Yes" : "No") << std::endl;
// Get all nodes for iteration
auto all_nodes = forest.getAllNodes();
for (auto* node : all_nodes) {
std::cout << "Node data: " << node->data << std::endl;
}
// Contract (remove) a node
forest.contract(node2);
// Clear the entire forest
forest.clear();
Algorithm Complexity
- makeSet: O(1)
- find: O(α(n)) amortized, where α is the inverse Ackermann function
- unionSets: O(α(n)) amortized
- expand: O(1) amortized (vector doubling strategy)
- contract: O(1) (marking nodes as invalid)
- clear: O(n) where n is the number of nodes
- size/capacity/isEmpty: O(1)
- getAllNodes: O(n) where n is the number of nodes
- Space: O(n) where n is the number of elements
Use Cases
Disjoint-set data structures are fundamental to many algorithms and applications:
- Graph Algorithms: Connected components, minimum spanning trees (Kruskal's algorithm)
- Image Processing: Connected component labeling, flood fill algorithms
- Network Analysis: Community detection, clustering algorithms
- Game Development: Pathfinding, collision detection systems
- Database Systems: Transaction management, dependency resolution
- Compiler Design: Register allocation, live variable analysis
Python Bindings
The project includes Python bindings that provide a unified interface for working with any Python object type. The bindings are built using pybind11 and offer the same functionality as the C++ implementation.
Features
- Type Flexibility: Works with integers, strings, lists, dictionaries, custom objects, etc.
- Unified API: Single
DisjointForestclass that handles any Python object type - Safe Operations: Built-in safety checks and error handling
- Memory Management: Automatic cleanup of Python references
Installation
# Install Python dependencies
cd python
pip3 install -r requirements.txt
# Build the Python module
make build
# Test the bindings
make test
# Run the example
make example
Python Usage
import disjoint_forest
# Create a forest
forest = disjoint_forest.DisjointForest(10)
# Create sets with different types
node1 = forest.make_set(42)
node2 = forest.make_set("hello")
node3 = forest.make_set([1, 2, 3])
# Union operations
forest.union_sets(node1, node2)
# Find representatives
rep = forest.find(node1)
print(f"Representative: {rep.data}")
# Dynamic operations
forest.expand(20)
print(f"Capacity: {forest.capacity()}")
print(f"Size: {forest.size()}")
# Get all nodes
all_nodes = forest.get_all_nodes()
for node in all_nodes:
print(f"Node: {node.data}")
Implementation Details
STL Vector Usage
The implementation uses std::vector instead of fixed-size arrays, providing several advantages:
- Dynamic Capacity: The forest can grow beyond its initial capacity using the
expand()method - Automatic Memory Management: Vectors handle memory allocation and deallocation automatically
- Efficient Resizing: STL vectors use exponential growth strategies for optimal performance
- Exception Safety: Vector operations provide strong exception guarantees
Memory Management
- Smart Pointers:
std::unique_ptrensures automatic cleanup of Node objects - RAII Principles: Resource acquisition is tied to object lifetime
- Exception Safety: Memory is properly cleaned up even when exceptions occur
- Python Integration: Python bindings include reference counting and safe memory management
Notes
- The implementation automatically handles path compression during find operations
- Union by rank ensures balanced trees for optimal performance
- Memory is automatically managed with proper cleanup in the destructor
- Python bindings provide safe access to all C++ functionality
- Dynamic operations maintain the forest's integrity while allowing flexible capacity management
Troubleshooting
Common Build Issues
Google Test Not Found
If you encounter linking errors with Google Test:
# The Makefile will automatically install Google Test if needed
make install-deps
# Or manually install for your system (see Prerequisites section)
Python Import Errors
If Python can't import the module:
# Ensure the module is built
cd python
make build
# Check if the shared library exists
ls -la *.so
# Verify Python path
python3 -c "import sys; print(sys.path)"
Memory Issues
The implementation uses smart pointers and RAII principles. If you encounter memory issues:
- Ensure you're not manually deleting Node pointers
- Use the provided methods (
clear(),contract()) instead of manual memory management - Check that Python bindings are properly handling object lifetimes
Contributing
We welcome contributions to improve the implementation, test coverage, or documentation. Please:
- Fork the repository and create a feature branch
- Follow the existing code style and naming conventions
- Add tests for new functionality
- Update documentation to reflect any changes
- Submit a pull request with a clear description of your changes
Development Setup
# Clone and setup
git clone <your-fork-url>
cd disjoint_set
# Install development dependencies
make install-deps
# Run all tests to ensure everything works
make test
make python-test
# Build everything
make all
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file disjoint_forest-1.0.0.tar.gz.
File metadata
- Download URL: disjoint_forest-1.0.0.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
978276cc9d42dc60a2f27b47adde5c87ad541eb6e279d51c3785a49a95fbd6f4
|
|
| MD5 |
9c2b1090ad87651ba2824fa5d99f0410
|
|
| BLAKE2b-256 |
d7d7ac5c27182f0dd1cef4a136cc593a54409191ab9536dd2617e839f795962e
|
File details
Details for the file disjoint_forest-1.0.0-cp310-cp310-manylinux2014_x86_64.whl.
File metadata
- Download URL: disjoint_forest-1.0.0-cp310-cp310-manylinux2014_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bc4ff118d2a45d99397ebda15c19d666714d689e9cdb9168b0daf864d10870e
|
|
| MD5 |
2f1ffc4085f4bfea19c8cc97363a0582
|
|
| BLAKE2b-256 |
39afd051855a1cb3fae5ec76237b284b65eb91a789b3c4637f89d2cb544edc0f
|