Skip to main content

A Python module for loading and managing canonical data files with standardized naming conventions

Project description

Canonical Loader

A Python module for loading and managing canonical data files with standardized naming conventions. This package provides utilities for extracting metadata from file names, loading data from various file formats, and maintaining data canonicality.

Features

  • Standardized file naming convention support
  • Automatic date extraction from file names
  • Support for CSV and Excel file formats
  • Support for both local file system and AWS S3 storage
  • Data transformation between DataFrame and dictionary formats
  • Metadata extraction and management
  • Canonical data saving with consistent formatting

Installation

pip install canonical-loader

Quick Start

Local Files

from canonical_loader import CanonicalLoader

# Initialize loader with regex pattern and folder path
loader = CanonicalLoader(regex="dataset-menu.*\.csv", file_folder="./dataset-canon")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

AWS S3

from canonical_loader import S3CanonicalLoader

# Initialize S3 loader with regex pattern, bucket name and prefix
loader = S3CanonicalLoader(regex="dataset-menu.*\.csv", bucket="my-bucket", bucket_prefix="data/")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

File Naming Convention

The package supports the following file naming patterns:

  • dataset-{name}-at{date_ref}-save{date_save}.{extension} - Single date reference
  • dataset-{name}-from{start_date}-to{end_date}-save{date_save}.{extension} - Date range
  • dataset-{name}-between{initial_date}-and{final_date}-save{date_save}.{extension} - Date interval

Requirements

  • Python >= 3.6
  • shining_pebbles >= 0.5.3
  • string_date_controller >= 0.1.3
  • tqdm
  • aws_s3_controller >= 0.7.5 (for S3 support)

Version History

0.2.0 (2025-04-18)

  • Added AWS S3 support via S3CanonicalLoader class
  • Refactored code to use inheritance for better maintainability
  • Improved file extension handling for Excel files
  • Enhanced metadata handling for different storage backends

0.1.0 (2025-04-18)

  • Initial release
  • Basic file loading and metadata extraction
  • Support for CSV and Excel files
  • Canonical data transformation and saving

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

June Young Park
AI Management Development Team Lead & Quant Strategist at LIFE Asset Management

LIFE Asset Management is a hedge fund management firm that integrates value investing and engagement strategies with quantitative approaches and financial technology, headquartered in Seoul, South Korea.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canonical_loader-0.2.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canonical_loader-0.2.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file canonical_loader-0.2.0.tar.gz.

File metadata

  • Download URL: canonical_loader-0.2.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for canonical_loader-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ff43b4c97595a89c7a9a0f08145ab9af3eae66550d35eaeda3ede7e92dcf67fb
MD5 d4617144e1da7dafbdc15099f500e7d7
BLAKE2b-256 a446dc55cd0a0f46555706e0eed07cbbcacaf3f33f8c3974bd0aa44bd6428b8a

See more details on using hashes here.

File details

Details for the file canonical_loader-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for canonical_loader-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cfa0084c5991a7296d2a0fb99b48cd054343f6e5fd0768503e2471d9bcd66765
MD5 5e74a303f349757589247e93221eb398
BLAKE2b-256 7925139295bac1744f3de83be58ef4f0d3b50269947f30cd1b87ca1b2d56d928

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page