Skip to main content

A Python module for loading and managing canonical data files with standardized naming conventions

Project description

Canonical Loader

A Python module for loading and managing canonical data files with standardized naming conventions. This package provides utilities for extracting metadata from file names, loading data from various file formats, and maintaining data canonicality.


[0.2.3] - 2025-04-21

  • Added simple_loader module for simplified loading scenarios
  • Refactored import paths and internal structure for better maintainability
  • Improved Excel file extension handling (xls, xlsx)
  • Cleaned up and reorganized utility modules
  • Updated version and documentation

Features

  • Standardized file naming convention support
  • Automatic date extraction from file names
  • Support for CSV and Excel file formats
  • Support for both local file system and AWS S3 storage
  • Data transformation between DataFrame and dictionary formats
  • Metadata extraction and management
  • Canonical data saving with consistent formatting

Installation

pip install canonical-loader

Quick Start

Local Files

from canonical_loader import CanonicalLoader

# Initialize loader with regex pattern and folder path
loader = CanonicalLoader(regex="dataset-menu.*\.csv", file_folder="./dataset-canon")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

AWS S3

from canonical_loader import S3CanonicalLoader

# Initialize S3 loader with regex pattern, bucket name and prefix
loader = S3CanonicalLoader(regex="dataset-menu.*\.csv", bucket="my-bucket", bucket_prefix="data/")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

File Naming Convention

The package supports the following file naming patterns:

  • dataset-{name}-at{date_ref}-save{date_save}.{extension} - Single date reference
  • dataset-{name}-from{start_date}-to{end_date}-save{date_save}.{extension} - Date range
  • dataset-{name}-between{initial_date}-and{final_date}-save{date_save}.{extension} - Date interval

Requirements

  • Python >= 3.6
  • shining_pebbles >= 0.5.3
  • string_date_controller >= 0.1.3
  • tqdm
  • aws_s3_controller >= 0.7.5 (for S3 support)

Version History

0.2.1 (2025-04-18)

  • Fixed import bug due to module name changes
  • Ensured proper imports between base and derived classes

0.2.0 (2025-04-18)

  • Added AWS S3 support via S3CanonicalLoader class
  • Refactored code to use inheritance for better maintainability
  • Improved file extension handling for Excel files
  • Enhanced metadata handling for different storage backends

0.1.0 (2025-04-18)

  • Initial release
  • Basic file loading and metadata extraction
  • Support for CSV and Excel files
  • Canonical data transformation and saving

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

June Young Park
AI Management Development Team Lead & Quant Strategist at LIFE Asset Management

LIFE Asset Management is a hedge fund management firm that integrates value investing and engagement strategies with quantitative approaches and financial technology, headquartered in Seoul, South Korea.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canonical_loader-0.2.3.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canonical_loader-0.2.3-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file canonical_loader-0.2.3.tar.gz.

File metadata

  • Download URL: canonical_loader-0.2.3.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for canonical_loader-0.2.3.tar.gz
Algorithm Hash digest
SHA256 90fb70bddd300c6df6a50b11b57737401cf7c7b8f80e7e8d5c32abec144124b0
MD5 fa69508272214701923fa8c61064096b
BLAKE2b-256 7eda95fc2b57c9271486847082ec843894b9f3c3ac87614806827fece7de7562

See more details on using hashes here.

File details

Details for the file canonical_loader-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for canonical_loader-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2fe5ce2700a3d9cc72262e4ef1272256e02ba19a801c6b033161ec5e8d633ab1
MD5 30b3b99f7bf8364122ff7c0357800841
BLAKE2b-256 0007d40919519ab8256b3b5d7281bd488326fa490e57b86d6034d164f5b7c32b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page