A comprehensive data loading framework for Excel, CSV, and JSON files with async support and data validation
Project description
Evolvis AI - Empowering Innovation Through AI
Evolvishub Data Loader
A robust, asynchronous data loading and processing framework designed for handling various file formats and database integrations.
Company: Evolvis AI
Author: Alban Maxhuni, PhD
Email: a.maxhuni@evolvis.ai
Features
- Multi-Format Support: Process Excel, CSV, JSON, and custom file formats
- Asynchronous Processing: Built with Python's asyncio for efficient I/O operations
- Configurable: YAML and INI configuration support
- Database Integration: SQLite and PostgreSQL support
- Error Handling: Comprehensive error handling and logging
- File Management: Automatic file movement and organization
- Notification System: Integrated notification system for process updates
- Extensible: Easy to add new processors and validators
Installation
- Clone the repository:
git clone https://github.com/yourusername/evolvishub-dataloader.git
cd evolvishub-dataloader
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Configuration
The framework supports both YAML and INI configuration formats. Configuration files should be placed in the config directory.
Example YAML Configuration
# Data Types Configuration
data_types:
types: "inventory,sales,purchases,orders,custom"
# Directory Configuration
directories:
root: "data"
processed: "data/processed"
failed: "data/failed"
# Database Configuration
database:
path: "data/database.db"
migrations: "migrations"
backup: "backups"
# File Processing Configuration
processing:
move_processed: true
add_timestamp: true
retry_attempts: 3
max_file_size: 10485760 # 10MB
Usage
Basic Usage
from src.data_loader.generic_data_loader import GenericDataLoader
from src.data_loader.sqlite_adapter import SQLiteAdapter
async def main():
# Initialize database adapter
db = SQLiteAdapter()
# Create data loader instance
loader = GenericDataLoader(db)
await loader.initialize()
# Load data from a file
results = await loader.load_data(
source="path/to/your/file.xlsx",
table_name="your_table_name"
)
# Process results
for result in results:
print(f"Status: {result['status']}")
print(f"Records loaded: {result.get('records_loaded', 0)}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Custom Processors
You can create custom processors for specific file formats:
from src.data_loader.generic_data_loader import DataProcessor
class CustomProcessor(DataProcessor):
async def process(self, data):
# Your custom processing logic here
return processed_data
Validation
Add custom validation to your data loading process:
async def custom_validator(data):
# Your validation logic here
return True
results = await loader.load_data(
source="path/to/your/file.xlsx",
table_name="your_table_name",
validator=custom_validator
)
Testing
Run the test suite:
PYTHONPATH=./ python -m pytest tests/ -v
Project Structure
evolvishub-dataloader/
├── config/
│ ├── data_loader.yaml
│ └── data_loader.ini
├── src/
│ └── data_loader/
│ ├── generic_data_loader.py
│ ├── sqlite_adapter.py
│ └── processors/
├── tests/
│ ├── test_generic_data_loader.py
│ ├── test_config_manager.py
│ └── test_specific_loaders.py
├── requirements.txt
└── README.md
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
For support, please open an issue in the GitHub repository or contact the maintainers.
Acknowledgments
- Thanks to all contributors who have helped shape this project
- Built with SQLAlchemy
- Powered by pandas
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evolvishub_dataloader-1.0.0.tar.gz.
File metadata
- Download URL: evolvishub_dataloader-1.0.0.tar.gz
- Upload date:
- Size: 35.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a746c701c597c7f26bddfa7c4ed0bb090f03ce94080b847dc2ef8cb962354bf2
|
|
| MD5 |
0fff924666f9757f78fedf28f92ce5eb
|
|
| BLAKE2b-256 |
cd6495ceaf1480b550ede252d16dca705ae13b1c510465e60120b58feffb8ade
|
File details
Details for the file evolvishub_dataloader-1.0.0-py3-none-any.whl.
File metadata
- Download URL: evolvishub_dataloader-1.0.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f69e797fbb2aeb34a2e192b6eb79525ce8d1629f408ca04c4fd314fa3f478678
|
|
| MD5 |
b4dda6cb395f349bd3da874737e17ed4
|
|
| BLAKE2b-256 |
56fa3409d45771f61935cd8e6c8b9e292f40f62a9707778283e9df00e45406cc
|