Skip to main content

A professional data cleanup management library

Project description

Evolvishub Data Cleanup Adapter

Evolvishub Data Cleanup Adapter Logo

A Python library for managing data cleanup and archiving in Evolvis applications.

About

This project is developed and maintained by Evolvis.ai.

Author

Alban Maxhuni, PhD
Email: a.maxhuni@evolvis.ai

Features

  • Configurable data cleanup based on file age and size thresholds
  • Support for both INI and YAML configuration files
  • File system monitoring with automatic cleanup
  • Configurable retention policies for different file types
  • Automatic backup of cleaned files
  • Cleanup of old backup files
  • Comprehensive logging
  • Thread-safe operations
  • Asynchronous operations for better performance

Installation

pip install evolvishub-data-cleanup-adapter

Usage

  1. Create a configuration file (INI or YAML):
# config.ini
[folders]
data_folder1 = /path/to/folder1
data_folder2 = /path/to/folder2

[thresholds]
max_size_gb = 1.0
max_age_days = 30

[backup]
directory = /path/to/backup
max_age_days = 90

[monitoring]
check_interval_seconds = 3600

[retention]
policy_log = 7
policy_temp = 1
  1. Use the library in your code:
import asyncio
from evolvishub_datacleanup import DataCleanupManager

async def main():
    try:
        # Initialize the manager with your config file
        manager = DataCleanupManager('config.ini')

        # Start monitoring
        await manager.start_monitoring()

        # Example of concurrent operations
        cleanup_task = asyncio.create_task(manager.cleanup_old_files())
        backup_task = asyncio.create_task(manager.cleanup_backup_files())
        
        # Wait for both operations to complete
        await asyncio.gather(cleanup_task, backup_task)

        # Get file information
        file_info = await manager.get_file_info()
        total_size = await manager.get_total_size()

        # ... your application code ...

    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Ensure monitoring is stopped
        await manager.stop_monitoring()

if __name__ == "__main__":
    asyncio.run(main())

Async Usage Guide

Basic Async Operations

All operations in the library are asynchronous and should be used with await:

# Single operation
await manager.cleanup_old_files()

# Multiple sequential operations
await manager.cleanup_old_files()
await manager.cleanup_backup_files()

Concurrent Operations

You can run multiple operations concurrently using asyncio.gather():

# Run multiple operations concurrently
await asyncio.gather(
    manager.cleanup_old_files(),
    manager.cleanup_backup_files(),
    manager.get_file_info()
)

Error Handling

Always wrap async operations in try-except blocks:

try:
    await manager.start_monitoring()
except Exception as e:
    print(f"Failed to start monitoring: {e}")

Best Practices

  1. Resource Management: Always ensure proper cleanup by using try-finally blocks:

    try:
        await manager.start_monitoring()
        # ... your code ...
    finally:
        await manager.stop_monitoring()
    
  2. Concurrent Operations: Use asyncio.gather() for independent operations:

    results = await asyncio.gather(
        manager.get_total_size(),
        manager.get_file_info(),
        return_exceptions=True
    )
    
  3. Cancellation: Handle task cancellation gracefully:

    try:
        async with asyncio.timeout(30):  # 30 second timeout
            await manager.cleanup_old_files()
    except asyncio.TimeoutError:
        print("Operation timed out")
    
  4. Event Loop: Use asyncio.run() as the main entry point:

    if __name__ == "__main__":
        asyncio.run(main())
    

Configuration

INI Format

[folders]
folder1 = /path/to/folder1
folder2 = /path/to/folder2

[thresholds]
max_size_gb = 1.0
max_age_days = 30

[backup]
directory = /path/to/backup
max_age_days = 90

[monitoring]
check_interval_seconds = 3600

[retention]
policy_log = 7
policy_temp = 1

YAML Format

data_folders:
  - /path/to/folder1
  - /path/to/folder2

cleanup_thresholds:
  max_size_gb: 1.0
  max_age_days: 30

backup_settings:
  directory: /path/to/backup
  max_backup_age_days: 90

monitoring_settings:
  check_interval: 3600

retention_policies:
  .log:
    max_age_days: 7
  .tmp:
    max_age_days: 1

API Reference

DataCleanupManager

The main class for managing data cleanup operations.

manager = DataCleanupManager(config_path: Union[str, Path])

Methods

  • async start_monitoring(): Start monitoring data folders
  • async stop_monitoring(): Stop monitoring
  • async cleanup_old_files(): Manually trigger cleanup
  • async cleanup_backup_files(): Clean up old backup files
  • async get_total_size(): Get total size of monitored folders
  • async get_file_info(): Get information about all files

Development

  1. Clone the repository:
git clone https://github.com/yourusername/evolvishub-datacleanup.git
cd evolvishub-datacleanup
  1. Install development dependencies:
pip install -e ".[dev]"
  1. Run tests:
pytest

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evolvishub_datacleanup-0.1.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evolvishub_datacleanup-0.1.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file evolvishub_datacleanup-0.1.1.tar.gz.

File metadata

  • Download URL: evolvishub_datacleanup-0.1.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for evolvishub_datacleanup-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b0d060c0eb61b2b9f8c201d61b64272cb4ce3230d99aa8662e2477e392f0943d
MD5 fd871d9291a0c815d518979237427be7
BLAKE2b-256 5d0bb7b075ff1b22d995995c6f0be407e8513ec8c3eca8f58c7184d72591aa01

See more details on using hashes here.

File details

Details for the file evolvishub_datacleanup-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for evolvishub_datacleanup-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc030cd453986dafff82e58ebfbbabb82f066788ed7ef1b92b14b5ccc6c28aae
MD5 deb556ad271c777ffb56b86ff8fbc529
BLAKE2b-256 9cf575baa338e47254a0fb4100aac04ae6ecf98def328ab5321cd49c757686e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page