Skip to main content

High-performance tree traversal library with blazing fast async support

Project description

DazzleTreeLib - Universal Tree Traversal Library

Version Python License Platform

DazzleTreeLib is the first Python library with a universal adapter system for tree traversal, providing both synchronous and asynchronous tree traversal with a universal interface. Currently optimized for high-performance filesystem operations with 4-5x caching speedup and production-grade error handling, the architecture is designed to support any tree-like data structure - from game development BSTs to JSON manipulation to hierarchical data processing.

⚠️ Alpha Release: This library is in active development. APIs may change between versions. We welcome feedback and contributions!

Why another tree library?

Have you ever needed to traverse different types of tree structures - filesystems, databases, API hierarchies, JSON documents - but ended up writing similar-but-different code for each one?

Or struggled with existing libraries that are either too limited (filesystem-only) or too complex (full graph theory) when you just need solid tree traversal with good performance?

What about when you need finer control - stopping at specific depths, filtering during traversal, caching results, or processing huge trees efficiently with async/await?

DazzleTreeLib solves these problems with a universal adapter system that works with ANY tree structure while providing powerful traversal controls.

Features

  • Universal Interface: One API for filesystem, database, API, or custom trees

  • Async Support: Built-in parallelism, with full async/await implementation, and batching (3.3x faster than sync)

  • Flexible Adapters: Easy integration with any tree-like data structure

  • Smart Traversal - Stop at any depth, filter during traversal, control breadth

  • Memory Efficient: Streaming iterators for handling large trees

  • Highly Extensible: Custom adapters, collectors, and traversal strategies

  • High-Performance Intelligent Caching: 4-5x speedup with completeness-aware caching

  • Error Resilient & Production Ready - Structured concurrency, proper error handling, streaming

What Makes DazzleTreeLib Different?

Quick Comparison

Feature DazzleTreeLib anytree treelib NetworkX
Universal adapter system
One API for any tree source
Composable adapters
Async/sync feature parity
Built-in caching

For more, see the detailed comparison in docs

DazzleTreeLib is Perfect for:

  • Multi-source tree traversal (files + database + API)
  • Complex filtering and transformation logic
  • Async/await workflows with parallel processing
  • Large trees requiring streaming and caching
  • Custom tree structures needing standard traversal

Consider alternatives for:

  • Simple filesystem-only tasks (use os.scandir - 6-7x faster)
  • Pure graph algorithms (use NetworkX)
  • In-memory-only trees (use anytree or treelib)

Performance

Benchmark Assessment (Sept. 2025)

Comparison Performance Best Use Case
DazzleTree async vs sync 3.3x faster When using DazzleTreeLib
DazzleTree vs os.scandir 6-7x slower DazzleTree for flexibility, os.scandir for speed
Memory usage ~15MB base + 14MB/1K nodes Acceptable for most applications

Quick Start

Installation

pip install dazzletreelib  # Coming soon to PyPI
# For now, install from source:
git clone https://github.com/djdarcy/DazzleTreeLib.git
cd DazzleTreeLib
pip install -e .

Basic Usage - Synchronous

from dazzletreelib.sync import FileSystemNode, FileSystemAdapter, traverse_tree

# Simple filesystem traversal
root_node = FileSystemNode("/path/to/directory")
adapter = FileSystemAdapter()

for node, depth in traverse_tree(root_node, adapter):
    print(f"{'  ' * depth}{node.path.name}")

Basic Usage - Asynchronous (3x+ Faster!)

import asyncio
from dazzletreelib.aio import traverse_tree_async

async def main():
    # Async traversal with blazing speed
    async for node in traverse_tree_async("/path/to/directory"):
        print(f"Processing: {node.path}")
        
        # Access file metadata asynchronously
        size = await node.size()
        if size and size > 1_000_000:  # Files > 1MB
            print(f"  Large file: {size:,} bytes")

asyncio.run(main())

Real-World Examples

Find Large Files Efficiently

from dazzletreelib.aio import traverse_tree_async
import asyncio

async def find_large_files(root_path, min_size_mb=10):
    """Find all files larger than specified size."""
    large_files = []
    
    async for node in traverse_tree_async(root_path):
        if node.path.is_file():
            size = await node.size()
            if size and size > min_size_mb * 1024 * 1024:
                large_files.append((node.path, size))
    
    # Sort by size descending
    large_files.sort(key=lambda x: x[1], reverse=True)
    return large_files

# Usage
files = asyncio.run(find_large_files("/home/user", min_size_mb=100))
for path, size in files[:10]:  # Top 10 largest
    print(f"{size/1024/1024:.1f} MB: {path}")

Parallel Directory Analysis

from dazzletreelib.aio import get_tree_stats_async
import asyncio

async def analyze_projects(project_dirs):
    """Analyze multiple project directories in parallel."""
    tasks = [get_tree_stats_async(dir) for dir in project_dirs]
    stats = await asyncio.gather(*tasks)
    
    for dir, stat in zip(project_dirs, stats):
        print(f"\n{dir}:")
        print(f"  Files: {stat['file_count']:,}")
        print(f"  Directories: {stat['dir_count']:,}")
        print(f"  Total Size: {stat['total_size']/1024/1024:.1f} MB")
        print(f"  Largest: {stat['largest_file']}")

# Analyze multiple projects simultaneously
projects = ["/code/project1", "/code/project2", "/code/project3"]
asyncio.run(analyze_projects(projects))

Directory Timestamp Fixer (folder-datetime-fix use case)

from dazzletreelib.aio import traverse_tree_async
import asyncio
from pathlib import Path
import os

async def fix_directory_timestamps(root_path):
    """Fix directory modification times to match their newest content."""
    directories = []
    
    # Collect all directories first (depth-first post-order)
    async for node in traverse_tree_async(root_path, strategy='dfs_post'):
        if node.path.is_dir():
            directories.append(node.path)
    
    # Process directories from deepest to shallowest
    for dir_path in reversed(directories):
        newest_time = 0
        
        # Find newest modification time in directory
        for item in dir_path.iterdir():
            stat = item.stat()
            newest_time = max(newest_time, stat.st_mtime)
        
        # Update directory timestamp
        if newest_time > 0:
            os.utime(dir_path, (newest_time, newest_time))
            print(f"Updated: {dir_path}")

# Fix all directory timestamps
asyncio.run(fix_directory_timestamps("/path/to/fix"))

Migrating from Sync to Async

The async API mirrors the sync API closely, making migration straightforward:

Sync Version

from dazzletreelib.sync import traverse_tree, FileSystemNode, FileSystemAdapter

node = FileSystemNode(path)
adapter = FileSystemAdapter()
for node, depth in traverse_tree(node, adapter):
    process(node)

Async Version

from dazzletreelib.aio import traverse_tree_async

async for node in traverse_tree_async(path):
    await process_async(node)

Key differences:

  • No need to create node/adapter explicitly in async
  • Use async for instead of for
  • Await any async operations on nodes
  • Wrap in asyncio.run() or existing async function

Advanced Features

Batched Parallel Processing

The async implementation uses intelligent batching for optimal performance:

# Control parallelism with batch_size and max_concurrent
async for node in traverse_tree_async(
    root,
    batch_size=256,      # Process children in batches
    max_concurrent=100   # Limit concurrent I/O operations
):
    await process(node)

Depth Limiting

# Only traverse 3 levels deep
async for node in traverse_tree_async(root, max_depth=3):
    print(node.path)

Custom Filtering

from dazzletreelib.aio import filter_tree_async

# Custom predicate function
async def is_python_file(node):
    return node.path.suffix == '.py'

# Get all Python files
python_files = await filter_tree_async(root, predicate=is_python_file)

High-Performance Caching

DazzleTreeLib features a sophisticated completeness-aware caching system that provides 4-5x performance improvements with intelligent memory management.

from dazzletreelib.aio.adapters import CompletenessAwareCacheAdapter

# Safe mode (default) - with memory protection
cached_adapter = CompletenessAwareCacheAdapter(
    base_adapter,
    enable_oom_protection=True,
    max_entries=10000,
    validation_ttl_seconds=5
)

# Fast mode - maximum performance (4-5x faster on repeated traversals)
fast_adapter = CompletenessAwareCacheAdapter(
    base_adapter,
    enable_oom_protection=False
)

# First traversal: populates cache
async for node in traverse_tree_async(root, adapter=cached_adapter):
    process(node)

# Second traversal: uses cache (4-5x faster!)
async for node in traverse_tree_async(root, adapter=cached_adapter):
    process(node)

Key features:

  • Completeness tracking: Knows if subtree is fully or partially cached
  • Depth-based caching: Understands traversal depth patterns
  • Safe/Fast modes: Choose between safety and maximum performance
  • LRU eviction: Intelligent memory management with OrderedDict
  • TTL validation: Configurable freshness checks with mtime
  • 99% memory reduction: Recent optimization removed redundant tracking

📖 Documentation:

Architecture

DazzleTreeLib uses a clean, modular architecture:

dazzletreelib/
├── version.py     # Centralized version management
├── sync/          # Synchronous implementation
│   ├── core/      # Core abstractions (Node, Adapter, Collector)
│   ├── adapters/  # Tree adapters
│   │   ├── filesystem.py      # FileSystem traversal
│   │   ├── filtering.py       # FilteringWrapper
│   │   └── smart_caching.py   # Caching with tracking
│   └── api.py     # High-level sync API
├── aio/           # Asynchronous implementation
│   ├── core/      # Async abstractions with batching
│   ├── adapters/  # Async adapters
│   │   ├── filesystem.py      # Async filesystem with parallel I/O
│   │   ├── filtering.py       # Async filtering
│   │   └── smart_caching.py   # Async caching adapter
│   └── api.py     # High-level async API
└── _common/       # Shared configuration and constants

Testing

Run the test suite:

# Recommended: Full test suite with proper isolation
python run_tests.py

# Run specific test categories
python run_tests.py --fast       # Quick tests only
python run_tests.py --isolated   # Interaction-sensitive tests
python run_tests.py --benchmarks # Performance benchmarks

# Manual pytest (for development)
pytest -m "not slow and not benchmark"  # Fast tests only
pytest -m benchmark                      # Benchmark tests only
pytest -m "not interaction_sensitive"    # Skip isolation-required tests
pytest --cov=dazzletreelib               # With coverage report

Benchmarks

Run performance benchmarks:

# Run all benchmarks
python benchmarks/accurate_performance_test.py

# Compare with native Python methods
python benchmarks/compare_file_search.py

# Run pytest benchmarks
pytest -m benchmark -v -s

Contributing

Contributions are welcome! Please ensure:

  • All tests pass (python run_tests.py)
  • Code is properly typed
  • Documentation is updated
  • Performance isn't regressed

Note: Git hooks are configured to:

  • Update version automatically on commit
  • Run fast tests before push
  • Block commits with private files on public branches

Like the project?

"Buy Me A Coffee"

Development Status

  • Stable: Sync implementation (v0.5.0)
  • Stable: Async implementation (v0.6.0)
  • Production Ready: Used in production systems (v0.10.0)
  • 🚧 Coming Soon: Additional adapters (S3, Database, API)

Related Projects

DazzleTreeLib is used in a growing set of tools:

  • folder-datetime-fix: Directory timestamp correction tool (uses DazzleTreeLib)
  • preserve: File tracking for easy location recovery & backup (/w integrity and sync functionality)

Acknowledgments

  • Inspired by excellent tree/graph libraries:
    • anytree - Python tree data structures with visualization
    • treelib - Efficient tree structure and operations
    • NetworkX - Extensive graph algorithms
    • pathlib - Modern path handling in Python stdlib
    • graph-tool - Rust-based / Python graph analysis toolkit
  • Uses aiofiles for async file operations
  • GitRepoKit - Automated version management system
  • Community contributors - Testing, feedback, and improvements

License

DazzleTreeLib Copyright (C) 2025 Dustin Darcy

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dazzletreelib-0.10.2.tar.gz (189.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dazzletreelib-0.10.2-py3-none-any.whl (116.3 kB view details)

Uploaded Python 3

File details

Details for the file dazzletreelib-0.10.2.tar.gz.

File metadata

  • Download URL: dazzletreelib-0.10.2.tar.gz
  • Upload date:
  • Size: 189.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for dazzletreelib-0.10.2.tar.gz
Algorithm Hash digest
SHA256 91893332c19c54ced1335f2e05fb5d0d0f24387db13d7a23537e6e18c0749fdd
MD5 428735c57c09f2ba8dbcb19e191601b1
BLAKE2b-256 4f15a106e2ce95e2fbd0c1d3d91c0272a39324fc62ff595f035d3b5a7bfe320d

See more details on using hashes here.

File details

Details for the file dazzletreelib-0.10.2-py3-none-any.whl.

File metadata

  • Download URL: dazzletreelib-0.10.2-py3-none-any.whl
  • Upload date:
  • Size: 116.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for dazzletreelib-0.10.2-py3-none-any.whl
Algorithm Hash digest
SHA256 17ddbb98cafaf11444c017b5d329fad567b1f51f8342c80199e23ff0829c5baa
MD5 5d5abe6d02126562414f2cc3c1d7b7c4
BLAKE2b-256 5249f42a1abc4003830ec9bf85d08a9166e815e8851026fb11346a4a0344a81d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page