Skip to main content

A professional data cleanup management library

Project description

Evolvishub Data Cleanup Adapter

Evolvishub Data Cleanup Adapter Logo

A Python library for managing data cleanup and archiving in Evolvis applications.

About

This project is developed and maintained by Evolvis.ai.

Author

Alban Maxhuni, PhD
Email: a.maxhuni@evolvis.ai

Features

  • Configurable data cleanup based on file age and size thresholds
  • Support for both INI and YAML configuration files
  • File system monitoring with automatic cleanup
  • Configurable retention policies for different file types
  • Automatic backup of cleaned files
  • Cleanup of old backup files
  • Comprehensive logging
  • Thread-safe operations
  • Asynchronous operations for better performance

Installation

pip install evolvishub-data-cleanup-adapter

Usage

  1. Create a configuration file (INI or YAML):
# config.ini
[folders]
data_folder1 = /path/to/folder1
data_folder2 = /path/to/folder2

[thresholds]
max_size_gb = 1.0
max_age_days = 30

[backup]
directory = /path/to/backup
max_age_days = 90

[monitoring]
check_interval_seconds = 3600

[retention]
policy_log = 7
policy_temp = 1
  1. Use the library in your code:
import asyncio
from evolvishub_datacleanup import DataCleanupManager

async def main():
    try:
        # Initialize the manager with your config file
        manager = DataCleanupManager('config.ini')

        # Start monitoring
        await manager.start_monitoring()

        # Example of concurrent operations
        cleanup_task = asyncio.create_task(manager.cleanup_old_files())
        backup_task = asyncio.create_task(manager.cleanup_backup_files())
        
        # Wait for both operations to complete
        await asyncio.gather(cleanup_task, backup_task)

        # Get file information
        file_info = await manager.get_file_info()
        total_size = await manager.get_total_size()

        # ... your application code ...

    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Ensure monitoring is stopped
        await manager.stop_monitoring()

if __name__ == "__main__":
    asyncio.run(main())

Async Usage Guide

Basic Async Operations

All operations in the library are asynchronous and should be used with await:

# Single operation
await manager.cleanup_old_files()

# Multiple sequential operations
await manager.cleanup_old_files()
await manager.cleanup_backup_files()

Concurrent Operations

You can run multiple operations concurrently using asyncio.gather():

# Run multiple operations concurrently
await asyncio.gather(
    manager.cleanup_old_files(),
    manager.cleanup_backup_files(),
    manager.get_file_info()
)

Error Handling

Always wrap async operations in try-except blocks:

try:
    await manager.start_monitoring()
except Exception as e:
    print(f"Failed to start monitoring: {e}")

Best Practices

  1. Resource Management: Always ensure proper cleanup by using try-finally blocks:

    try:
        await manager.start_monitoring()
        # ... your code ...
    finally:
        await manager.stop_monitoring()
    
  2. Concurrent Operations: Use asyncio.gather() for independent operations:

    results = await asyncio.gather(
        manager.get_total_size(),
        manager.get_file_info(),
        return_exceptions=True
    )
    
  3. Cancellation: Handle task cancellation gracefully:

    try:
        async with asyncio.timeout(30):  # 30 second timeout
            await manager.cleanup_old_files()
    except asyncio.TimeoutError:
        print("Operation timed out")
    
  4. Event Loop: Use asyncio.run() as the main entry point:

    if __name__ == "__main__":
        asyncio.run(main())
    

Configuration

INI Format

[folders]
folder1 = /path/to/folder1
folder2 = /path/to/folder2

[thresholds]
max_size_gb = 1.0
max_age_days = 30

[backup]
directory = /path/to/backup
max_age_days = 90

[monitoring]
check_interval_seconds = 3600

[retention]
policy_log = 7
policy_temp = 1

YAML Format

data_folders:
  - /path/to/folder1
  - /path/to/folder2

cleanup_thresholds:
  max_size_gb: 1.0
  max_age_days: 30

backup_settings:
  directory: /path/to/backup
  max_backup_age_days: 90

monitoring_settings:
  check_interval: 3600

retention_policies:
  .log:
    max_age_days: 7
  .tmp:
    max_age_days: 1

API Reference

DataCleanupManager

The main class for managing data cleanup operations.

manager = DataCleanupManager(config_path: Union[str, Path])

Methods

  • async start_monitoring(): Start monitoring data folders
  • async stop_monitoring(): Stop monitoring
  • async cleanup_old_files(): Manually trigger cleanup
  • async cleanup_backup_files(): Clean up old backup files
  • async get_total_size(): Get total size of monitored folders
  • async get_file_info(): Get information about all files

Development

  1. Clone the repository:
git clone https://github.com/yourusername/evolvishub-datacleanup.git
cd evolvishub-datacleanup
  1. Install development dependencies:
pip install -e ".[dev]"
  1. Run tests:
pytest

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evolvishub_datacleanup-0.1.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evolvishub_datacleanup-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file evolvishub_datacleanup-0.1.0.tar.gz.

File metadata

  • Download URL: evolvishub_datacleanup-0.1.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for evolvishub_datacleanup-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2125cdbb80191f4bc5114a0cd126b3dfa13ec70426fea87054de8b39cb7e8f0f
MD5 faa893b30ae8be9bb758af78bfbeb4ba
BLAKE2b-256 964babde9aa3676445724ffea2dd7b144edb66391ea5291b17c0f587d6dd29da

See more details on using hashes here.

File details

Details for the file evolvishub_datacleanup-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for evolvishub_datacleanup-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71deb6db51aac43fe9169accac7e94bab3c12b03b5ae8e11fc56376203005e3c
MD5 e0e9ed1d5333b7d9414f7415270e5927
BLAKE2b-256 094dc7b38503208eb51d28323a7df4493882744e666fa5cd33be6067049694a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page