High-performance tree traversal library with blazing fast async support
Project description
DazzleTreeLib - Universal Tree Traversal Library
DazzleTreeLib is the first Python library with a universal adapter system for tree traversal, providing both synchronous and asynchronous tree traversal with a universal interface. Currently optimized for high-performance filesystem operations with 4-5x caching speedup and production-grade error handling, the architecture is designed to support any tree-like data structure - from game development BSTs to JSON manipulation to hierarchical data processing.
⚠️ Alpha Release: This library is in active development. APIs may change between versions. We welcome feedback and contributions!
Why another tree library?
Have you ever needed to traverse different types of tree structures - filesystems, databases, API hierarchies, JSON documents - but ended up writing similar-but-different code for each one?
Or struggled with existing libraries that are either too limited (filesystem-only) or too complex (full graph theory) when you just need solid tree traversal with good performance?
What about when you need finer control - stopping at specific depths, filtering during traversal, caching results, or processing huge trees efficiently with async/await?
DazzleTreeLib solves these problems with a universal adapter system that works with ANY tree structure while providing powerful traversal controls.
Features
-
Universal Interface: One API for filesystem, database, API, or custom trees
-
Async Support: Built-in parallelism, with full async/await implementation, and batching (3.3x faster than sync)
-
Flexible Adapters: Easy integration with any tree-like data structure
-
Smart Traversal - Stop at any depth, filter during traversal, control breadth
-
Memory Efficient: Streaming iterators for handling large trees
-
Highly Extensible: Custom adapters, collectors, and traversal strategies
-
High-Performance Intelligent Caching: 4-5x speedup with completeness-aware caching
-
Error Resilient & Production Ready - Structured concurrency, proper error handling, streaming
What Makes DazzleTreeLib Different?
Quick Comparison
| Feature | DazzleTreeLib | anytree | treelib | NetworkX |
|---|---|---|---|---|
| Universal adapter system | ✅ | ❌ | ❌ | ❌ |
| One API for any tree source | ✅ | ❌ | ❌ | ❌ |
| Composable adapters | ✅ | ❌ | ❌ | ❌ |
| Async/sync feature parity | ✅ | ❌ | ❌ | ❌ |
| Built-in caching | ✅ | ❌ | ❌ | ❌ |
For more, see the detailed comparison in docs
✅ DazzleTreeLib is Perfect for:
- Multi-source tree traversal (files + database + API)
- Complex filtering and transformation logic
- Async/await workflows with parallel processing
- Large trees requiring streaming and caching
- Custom tree structures needing standard traversal
❌ Consider alternatives for:
- Simple filesystem-only tasks (use
os.scandir- 6-7x faster) - Pure graph algorithms (use NetworkX)
- In-memory-only trees (use anytree or treelib)
Performance
Benchmark Assessment (Sept. 2025)
| Comparison | Performance | Best Use Case |
|---|---|---|
| DazzleTree async vs sync | 3.3x faster | When using DazzleTreeLib |
| DazzleTree vs os.scandir | 6-7x slower | DazzleTree for flexibility, os.scandir for speed |
| Memory usage | ~15MB base + 14MB/1K nodes | Acceptable for most applications |
Quick Start
Installation
pip install dazzletreelib # Coming soon to PyPI
# For now, install from source:
git clone https://github.com/djdarcy/DazzleTreeLib.git
cd DazzleTreeLib
pip install -e .
Basic Usage - Synchronous
from dazzletreelib.sync import FileSystemNode, FileSystemAdapter, traverse_tree
# Simple filesystem traversal
root_node = FileSystemNode("/path/to/directory")
adapter = FileSystemAdapter()
for node, depth in traverse_tree(root_node, adapter):
print(f"{' ' * depth}{node.path.name}")
Basic Usage - Asynchronous (3x+ Faster!)
import asyncio
from dazzletreelib.aio import traverse_tree_async
async def main():
# Async traversal with blazing speed
async for node in traverse_tree_async("/path/to/directory"):
print(f"Processing: {node.path}")
# Access file metadata asynchronously
size = await node.size()
if size and size > 1_000_000: # Files > 1MB
print(f" Large file: {size:,} bytes")
asyncio.run(main())
Real-World Examples
Find Large Files Efficiently
from dazzletreelib.aio import traverse_tree_async
import asyncio
async def find_large_files(root_path, min_size_mb=10):
"""Find all files larger than specified size."""
large_files = []
async for node in traverse_tree_async(root_path):
if node.path.is_file():
size = await node.size()
if size and size > min_size_mb * 1024 * 1024:
large_files.append((node.path, size))
# Sort by size descending
large_files.sort(key=lambda x: x[1], reverse=True)
return large_files
# Usage
files = asyncio.run(find_large_files("/home/user", min_size_mb=100))
for path, size in files[:10]: # Top 10 largest
print(f"{size/1024/1024:.1f} MB: {path}")
Parallel Directory Analysis
from dazzletreelib.aio import get_tree_stats_async
import asyncio
async def analyze_projects(project_dirs):
"""Analyze multiple project directories in parallel."""
tasks = [get_tree_stats_async(dir) for dir in project_dirs]
stats = await asyncio.gather(*tasks)
for dir, stat in zip(project_dirs, stats):
print(f"\n{dir}:")
print(f" Files: {stat['file_count']:,}")
print(f" Directories: {stat['dir_count']:,}")
print(f" Total Size: {stat['total_size']/1024/1024:.1f} MB")
print(f" Largest: {stat['largest_file']}")
# Analyze multiple projects simultaneously
projects = ["/code/project1", "/code/project2", "/code/project3"]
asyncio.run(analyze_projects(projects))
Directory Timestamp Fixer (folder-datetime-fix use case)
from dazzletreelib.aio import traverse_tree_async
import asyncio
from pathlib import Path
import os
async def fix_directory_timestamps(root_path):
"""Fix directory modification times to match their newest content."""
directories = []
# Collect all directories first (depth-first post-order)
async for node in traverse_tree_async(root_path, strategy='dfs_post'):
if node.path.is_dir():
directories.append(node.path)
# Process directories from deepest to shallowest
for dir_path in reversed(directories):
newest_time = 0
# Find newest modification time in directory
for item in dir_path.iterdir():
stat = item.stat()
newest_time = max(newest_time, stat.st_mtime)
# Update directory timestamp
if newest_time > 0:
os.utime(dir_path, (newest_time, newest_time))
print(f"Updated: {dir_path}")
# Fix all directory timestamps
asyncio.run(fix_directory_timestamps("/path/to/fix"))
Migrating from Sync to Async
The async API mirrors the sync API closely, making migration straightforward:
Sync Version
from dazzletreelib.sync import traverse_tree, FileSystemNode, FileSystemAdapter
node = FileSystemNode(path)
adapter = FileSystemAdapter()
for node, depth in traverse_tree(node, adapter):
process(node)
Async Version
from dazzletreelib.aio import traverse_tree_async
async for node in traverse_tree_async(path):
await process_async(node)
Key differences:
- No need to create node/adapter explicitly in async
- Use
async forinstead offor - Await any async operations on nodes
- Wrap in
asyncio.run()or existing async function
Advanced Features
Batched Parallel Processing
The async implementation uses intelligent batching for optimal performance:
# Control parallelism with batch_size and max_concurrent
async for node in traverse_tree_async(
root,
batch_size=256, # Process children in batches
max_concurrent=100 # Limit concurrent I/O operations
):
await process(node)
Depth Limiting
# Only traverse 3 levels deep
async for node in traverse_tree_async(root, max_depth=3):
print(node.path)
Custom Filtering
from dazzletreelib.aio import filter_tree_async
# Custom predicate function
async def is_python_file(node):
return node.path.suffix == '.py'
# Get all Python files
python_files = await filter_tree_async(root, predicate=is_python_file)
High-Performance Caching
DazzleTreeLib features a sophisticated completeness-aware caching system that provides 4-5x performance improvements with intelligent memory management.
from dazzletreelib.aio.adapters import CompletenessAwareCacheAdapter
# Safe mode (default) - with memory protection
cached_adapter = CompletenessAwareCacheAdapter(
base_adapter,
enable_oom_protection=True,
max_entries=10000,
validation_ttl_seconds=5
)
# Fast mode - maximum performance (4-5x faster on repeated traversals)
fast_adapter = CompletenessAwareCacheAdapter(
base_adapter,
enable_oom_protection=False
)
# First traversal: populates cache
async for node in traverse_tree_async(root, adapter=cached_adapter):
process(node)
# Second traversal: uses cache (4-5x faster!)
async for node in traverse_tree_async(root, adapter=cached_adapter):
process(node)
Key features:
- Completeness tracking: Knows if subtree is fully or partially cached
- Depth-based caching: Understands traversal depth patterns
- Safe/Fast modes: Choose between safety and maximum performance
- LRU eviction: Intelligent memory management with OrderedDict
- TTL validation: Configurable freshness checks with mtime
- 99% memory reduction: Recent optimization removed redundant tracking
📖 Documentation:
- Caching Basics - Start here if new to caching concepts
- Advanced Caching - Architecture details, comparisons with other libraries
Architecture
DazzleTreeLib uses a clean, modular architecture:
dazzletreelib/
├── version.py # Centralized version management
├── sync/ # Synchronous implementation
│ ├── core/ # Core abstractions (Node, Adapter, Collector)
│ ├── adapters/ # Tree adapters
│ │ ├── filesystem.py # FileSystem traversal
│ │ ├── filtering.py # FilteringWrapper
│ │ └── smart_caching.py # Caching with tracking
│ └── api.py # High-level sync API
├── aio/ # Asynchronous implementation
│ ├── core/ # Async abstractions with batching
│ ├── adapters/ # Async adapters
│ │ ├── filesystem.py # Async filesystem with parallel I/O
│ │ ├── filtering.py # Async filtering
│ │ └── smart_caching.py # Async caching adapter
│ └── api.py # High-level async API
└── _common/ # Shared configuration and constants
Testing
Run the test suite:
# Recommended: Full test suite with proper isolation
python run_tests.py
# Run specific test categories
python run_tests.py --fast # Quick tests only
python run_tests.py --isolated # Interaction-sensitive tests
python run_tests.py --benchmarks # Performance benchmarks
# Manual pytest (for development)
pytest -m "not slow and not benchmark" # Fast tests only
pytest -m benchmark # Benchmark tests only
pytest -m "not interaction_sensitive" # Skip isolation-required tests
pytest --cov=dazzletreelib # With coverage report
Benchmarks
Run performance benchmarks:
# Run all benchmarks
python benchmarks/accurate_performance_test.py
# Compare with native Python methods
python benchmarks/compare_file_search.py
# Run pytest benchmarks
pytest -m benchmark -v -s
Contributing
Contributions are welcome! Please ensure:
- All tests pass (
python run_tests.py) - Code is properly typed
- Documentation is updated
- Performance isn't regressed
Note: Git hooks are configured to:
- Update version automatically on commit
- Run fast tests before push
- Block commits with private files on public branches
Like the project?
Development Status
- Stable: Sync implementation (v0.5.0)
- Stable: Async implementation (v0.6.0)
- Production Ready: Used in production systems (v0.10.0)
- 🚧 Coming Soon: Additional adapters (S3, Database, API)
Related Projects
DazzleTreeLib is used in a growing set of tools:
- folder-datetime-fix: Directory timestamp correction tool (uses DazzleTreeLib)
- preserve: File tracking for easy location recovery & backup (/w integrity and sync functionality)
Acknowledgments
- Inspired by excellent tree/graph libraries:
- anytree - Python tree data structures with visualization
- treelib - Efficient tree structure and operations
- NetworkX - Extensive graph algorithms
- pathlib - Modern path handling in Python stdlib
- graph-tool - Rust-based / Python graph analysis toolkit
- Uses aiofiles for async file operations
- GitRepoKit - Automated version management system
- Community contributors - Testing, feedback, and improvements
License
DazzleTreeLib Copyright (C) 2025 Dustin Darcy
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dazzletreelib-0.10.2.tar.gz.
File metadata
- Download URL: dazzletreelib-0.10.2.tar.gz
- Upload date:
- Size: 189.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91893332c19c54ced1335f2e05fb5d0d0f24387db13d7a23537e6e18c0749fdd
|
|
| MD5 |
428735c57c09f2ba8dbcb19e191601b1
|
|
| BLAKE2b-256 |
4f15a106e2ce95e2fbd0c1d3d91c0272a39324fc62ff595f035d3b5a7bfe320d
|
File details
Details for the file dazzletreelib-0.10.2-py3-none-any.whl.
File metadata
- Download URL: dazzletreelib-0.10.2-py3-none-any.whl
- Upload date:
- Size: 116.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17ddbb98cafaf11444c017b5d329fad567b1f51f8342c80199e23ff0829c5baa
|
|
| MD5 |
5d5abe6d02126562414f2cc3c1d7b7c4
|
|
| BLAKE2b-256 |
5249f42a1abc4003830ec9bf85d08a9166e815e8851026fb11346a4a0344a81d
|