Skip to main content

A Python module for loading and managing canonical data files with standardized naming conventions

Project description

Canonical Loader

A Python module for loading and managing canonical data files with standardized naming conventions. This package provides utilities for extracting metadata from file names, loading data from various file formats, and maintaining data canonicality.


[0.2.4] - 2025-04-23

  • Bugfix release: fixed import path and compatibility issues after major refactoring
  • Ensured stable operation with latest dependencies

[0.2.3] - 2025-04-21

  • Added simple_loader module for simplified loading scenarios
  • Refactored import paths and internal structure for better maintainability
  • Improved Excel file extension handling (xls, xlsx)
  • Cleaned up and reorganized utility modules
  • Updated version and documentation

Features

  • Standardized file naming convention support
  • Automatic date extraction from file names
  • Support for CSV and Excel file formats
  • Support for both local file system and AWS S3 storage
  • Data transformation between DataFrame and dictionary formats
  • Metadata extraction and management
  • Canonical data saving with consistent formatting

Installation

pip install canonical-loader

Quick Start

Local Files

from canonical_loader import CanonicalLoader

# Initialize loader with regex pattern and folder path
loader = CanonicalLoader(regex="dataset-menu.*\.csv", file_folder="./dataset-canon")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

AWS S3

from canonical_loader import S3CanonicalLoader

# Initialize S3 loader with regex pattern, bucket name and prefix
loader = S3CanonicalLoader(regex="dataset-menu.*\.csv", bucket="my-bucket", bucket_prefix="data/")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

File Naming Convention

The package supports the following file naming patterns:

  • dataset-{name}-at{date_ref}-save{date_save}.{extension} - Single date reference
  • dataset-{name}-from{start_date}-to{end_date}-save{date_save}.{extension} - Date range
  • dataset-{name}-between{initial_date}-and{final_date}-save{date_save}.{extension} - Date interval

Requirements

  • Python >= 3.6
  • shining_pebbles >= 0.5.3
  • string_date_controller >= 0.1.3
  • tqdm
  • aws_s3_controller >= 0.7.5 (for S3 support)

Version History

0.2.1 (2025-04-18)

  • Fixed import bug due to module name changes
  • Ensured proper imports between base and derived classes

0.2.0 (2025-04-18)

  • Added AWS S3 support via S3CanonicalLoader class
  • Refactored code to use inheritance for better maintainability
  • Improved file extension handling for Excel files
  • Enhanced metadata handling for different storage backends

0.1.0 (2025-04-18)

  • Initial release
  • Basic file loading and metadata extraction
  • Support for CSV and Excel files
  • Canonical data transformation and saving

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

June Young Park
AI Management Development Team Lead & Quant Strategist at LIFE Asset Management

LIFE Asset Management is a hedge fund management firm that integrates value investing and engagement strategies with quantitative approaches and financial technology, headquartered in Seoul, South Korea.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canonical_loader-0.2.4.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canonical_loader-0.2.4-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file canonical_loader-0.2.4.tar.gz.

File metadata

  • Download URL: canonical_loader-0.2.4.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for canonical_loader-0.2.4.tar.gz
Algorithm Hash digest
SHA256 01edf86665faccbfe1190710aebbcb23647f22e50c9a51b955e95ebff4f916ea
MD5 b4aa6c683872641ef3640962cb78c708
BLAKE2b-256 47b47868ef750b7dfcfc3b80ba77d632a47d7c8e00a2381ec443fbb0bb40eae1

See more details on using hashes here.

File details

Details for the file canonical_loader-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for canonical_loader-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8266058418eaffc641b4835c146566ac1394c5a384f3f69c1eb13e0bf18a6b21
MD5 12abd5e49a1ab1ab0f77e1fcc36e393f
BLAKE2b-256 75ba4e35dedfced49a06d263c89133bb25c4a1b8541923ed024ed0b3720fa3e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page