Skip to main content

A comprehensive data loading framework for Excel, CSV, and JSON files with async support and data validation

Project description

Evolvishub Logo

Evolvis AI - Empowering Innovation Through AI

Evolvishub Data Loader

A robust, asynchronous data loading and processing framework designed for handling various file formats and database integrations.


Company: Evolvis AI

Author: Alban Maxhuni, PhD
Email: a.maxhuni@evolvis.ai


Features

  • Multi-Format Support: Process Excel, CSV, JSON, and custom file formats
  • Asynchronous Processing: Built with Python's asyncio for efficient I/O operations
  • Configurable: YAML and INI configuration support
  • Database Integration: SQLite and PostgreSQL support
  • Error Handling: Comprehensive error handling and logging
  • File Management: Automatic file movement and organization
  • Notification System: Integrated notification system for process updates
  • Extensible: Easy to add new processors and validators

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/evolvishub-dataloader.git
cd evolvishub-dataloader
  1. Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Configuration

The framework supports both YAML and INI configuration formats. Configuration files should be placed in the config directory.

Example YAML Configuration

# Data Types Configuration
data_types:
  types: "inventory,sales,purchases,orders,custom"

# Directory Configuration
directories:
  root: "data"
  processed: "data/processed"
  failed: "data/failed"

# Database Configuration
database:
  path: "data/database.db"
  migrations: "migrations"
  backup: "backups"

# File Processing Configuration
processing:
  move_processed: true
  add_timestamp: true
  retry_attempts: 3
  max_file_size: 10485760  # 10MB

Usage

Basic Usage

from src.data_loader.generic_data_loader import GenericDataLoader
from src.data_loader.sqlite_adapter import SQLiteAdapter

async def main():
    # Initialize database adapter
    db = SQLiteAdapter()
    
    # Create data loader instance
    loader = GenericDataLoader(db)
    await loader.initialize()
    
    # Load data from a file
    results = await loader.load_data(
        source="path/to/your/file.xlsx",
        table_name="your_table_name"
    )
    
    # Process results
    for result in results:
        print(f"Status: {result['status']}")
        print(f"Records loaded: {result.get('records_loaded', 0)}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Custom Processors

You can create custom processors for specific file formats:

from src.data_loader.generic_data_loader import DataProcessor

class CustomProcessor(DataProcessor):
    async def process(self, data):
        # Your custom processing logic here
        return processed_data

Validation

Add custom validation to your data loading process:

async def custom_validator(data):
    # Your validation logic here
    return True

results = await loader.load_data(
    source="path/to/your/file.xlsx",
    table_name="your_table_name",
    validator=custom_validator
)

Testing

Run the test suite:

PYTHONPATH=./ python -m pytest tests/ -v

Project Structure

evolvishub-dataloader/
├── config/
│   ├── data_loader.yaml
│   └── data_loader.ini
├── src/
│   └── data_loader/
│       ├── generic_data_loader.py
│       ├── sqlite_adapter.py
│       └── processors/
├── tests/
│   ├── test_generic_data_loader.py
│   ├── test_config_manager.py
│   └── test_specific_loaders.py
├── requirements.txt
└── README.md

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support, please open an issue in the GitHub repository or contact the maintainers.

Acknowledgments

  • Thanks to all contributors who have helped shape this project
  • Built with SQLAlchemy
  • Powered by pandas

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evolvishub_dataloader-1.0.0.tar.gz (35.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evolvishub_dataloader-1.0.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file evolvishub_dataloader-1.0.0.tar.gz.

File metadata

  • Download URL: evolvishub_dataloader-1.0.0.tar.gz
  • Upload date:
  • Size: 35.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for evolvishub_dataloader-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a746c701c597c7f26bddfa7c4ed0bb090f03ce94080b847dc2ef8cb962354bf2
MD5 0fff924666f9757f78fedf28f92ce5eb
BLAKE2b-256 cd6495ceaf1480b550ede252d16dca705ae13b1c510465e60120b58feffb8ade

See more details on using hashes here.

File details

Details for the file evolvishub_dataloader-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for evolvishub_dataloader-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f69e797fbb2aeb34a2e192b6eb79525ce8d1629f408ca04c4fd314fa3f478678
MD5 b4dda6cb395f349bd3da874737e17ed4
BLAKE2b-256 56fa3409d45771f61935cd8e6c8b9e292f40f62a9707778283e9df00e45406cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page