A Python module for loading and managing canonical data files with standardized naming conventions
Project description
Canonical Loader
A Python module for loading and managing canonical data files with standardized naming conventions. This package provides utilities for extracting metadata from file names, loading data from various file formats, and maintaining data canonicality.
Features
- Standardized file naming convention support
- Automatic date extraction from file names
- Support for CSV and Excel file formats
- Support for both local file system and AWS S3 storage
- Data transformation between DataFrame and dictionary formats
- Metadata extraction and management
- Canonical data saving with consistent formatting
Installation
pip install canonical-loader
Quick Start
Local Files
from canonical_loader import CanonicalLoader
# Initialize loader with regex pattern and folder path
loader = CanonicalLoader(regex="dataset-menu.*\.csv", file_folder="./dataset-canon")
# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()
# Save data in canonical format
loader.save_data_as_df()
AWS S3
from canonical_loader import S3CanonicalLoader
# Initialize S3 loader with regex pattern, bucket name and prefix
loader = S3CanonicalLoader(regex="dataset-menu.*\.csv", bucket="my-bucket", bucket_prefix="data/")
# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()
# Save data in canonical format
loader.save_data_as_df()
File Naming Convention
The package supports the following file naming patterns:
dataset-{name}-at{date_ref}-save{date_save}.{extension}- Single date referencedataset-{name}-from{start_date}-to{end_date}-save{date_save}.{extension}- Date rangedataset-{name}-between{initial_date}-and{final_date}-save{date_save}.{extension}- Date interval
Requirements
- Python >= 3.6
- shining_pebbles >= 0.5.3
- string_date_controller >= 0.1.3
- tqdm
- aws_s3_controller >= 0.7.5 (for S3 support)
Version History
0.2.0 (2025-04-18)
- Added AWS S3 support via S3CanonicalLoader class
- Refactored code to use inheritance for better maintainability
- Improved file extension handling for Excel files
- Enhanced metadata handling for different storage backends
0.1.0 (2025-04-18)
- Initial release
- Basic file loading and metadata extraction
- Support for CSV and Excel files
- Canonical data transformation and saving
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
June Young Park
AI Management Development Team Lead & Quant Strategist at LIFE Asset Management
LIFE Asset Management is a hedge fund management firm that integrates value investing and engagement strategies with quantitative approaches and financial technology, headquartered in Seoul, South Korea.
Contact
- Email: juneyoungpaak@gmail.com
- Location: TWO IFC, Yeouido, Seoul
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file canonical_loader-0.2.0.tar.gz.
File metadata
- Download URL: canonical_loader-0.2.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff43b4c97595a89c7a9a0f08145ab9af3eae66550d35eaeda3ede7e92dcf67fb
|
|
| MD5 |
d4617144e1da7dafbdc15099f500e7d7
|
|
| BLAKE2b-256 |
a446dc55cd0a0f46555706e0eed07cbbcacaf3f33f8c3974bd0aa44bd6428b8a
|
File details
Details for the file canonical_loader-0.2.0-py3-none-any.whl.
File metadata
- Download URL: canonical_loader-0.2.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfa0084c5991a7296d2a0fb99b48cd054343f6e5fd0768503e2471d9bcd66765
|
|
| MD5 |
5e74a303f349757589247e93221eb398
|
|
| BLAKE2b-256 |
7925139295bac1744f3de83be58ef4f0d3b50269947f30cd1b87ca1b2d56d928
|