Skip to main content

A Python module for loading and managing canonical data files with standardized naming conventions

Project description

Canonical Loader

A Python module for loading and managing canonical data files with standardized naming conventions. This package provides utilities for extracting metadata from file names, loading data from various file formats, and maintaining data canonicality.

Features

  • Standardized file naming convention support
  • Automatic date extraction from file names
  • Support for CSV and Excel file formats
  • Support for both local file system and AWS S3 storage
  • Data transformation between DataFrame and dictionary formats
  • Metadata extraction and management
  • Canonical data saving with consistent formatting

Installation

pip install canonical-loader

Quick Start

Local Files

from canonical_loader import CanonicalLoader

# Initialize loader with regex pattern and folder path
loader = CanonicalLoader(regex="dataset-menu.*\.csv", file_folder="./dataset-canon")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

AWS S3

from canonical_loader import S3CanonicalLoader

# Initialize S3 loader with regex pattern, bucket name and prefix
loader = S3CanonicalLoader(regex="dataset-menu.*\.csv", bucket="my-bucket", bucket_prefix="data/")

# Access loaded data
df = loader.get_df()
data = loader.get_data()
metadata = loader.get_meta_data()

# Save data in canonical format
loader.save_data_as_df()

File Naming Convention

The package supports the following file naming patterns:

  • dataset-{name}-at{date_ref}-save{date_save}.{extension} - Single date reference
  • dataset-{name}-from{start_date}-to{end_date}-save{date_save}.{extension} - Date range
  • dataset-{name}-between{initial_date}-and{final_date}-save{date_save}.{extension} - Date interval

Requirements

  • Python >= 3.6
  • shining_pebbles >= 0.5.3
  • string_date_controller >= 0.1.3
  • tqdm
  • aws_s3_controller >= 0.7.5 (for S3 support)

Version History

0.2.1 (2025-04-18)

  • Fixed import bug due to module name changes
  • Ensured proper imports between base and derived classes

0.2.0 (2025-04-18)

  • Added AWS S3 support via S3CanonicalLoader class
  • Refactored code to use inheritance for better maintainability
  • Improved file extension handling for Excel files
  • Enhanced metadata handling for different storage backends

0.1.0 (2025-04-18)

  • Initial release
  • Basic file loading and metadata extraction
  • Support for CSV and Excel files
  • Canonical data transformation and saving

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

June Young Park
AI Management Development Team Lead & Quant Strategist at LIFE Asset Management

LIFE Asset Management is a hedge fund management firm that integrates value investing and engagement strategies with quantitative approaches and financial technology, headquartered in Seoul, South Korea.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canonical_loader-0.2.1.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canonical_loader-0.2.1-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file canonical_loader-0.2.1.tar.gz.

File metadata

  • Download URL: canonical_loader-0.2.1.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for canonical_loader-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b3a9898d56750dd3036f8a96cd4f551ea4e56fc50e55848fb98f67a93d32cbfc
MD5 d1f1fa88608be32c7f5d32cc52e6081c
BLAKE2b-256 511b4cecf0f85dddd8431b917d2204d124b5d1dde7570499f44df2bfb42e4154

See more details on using hashes here.

File details

Details for the file canonical_loader-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for canonical_loader-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6405eb5bbb2820d188b0866c2dbe63f576015871ab7783e5ba57ae4241ba9694
MD5 e7b3874ad95acdff9e92a9201fd57259
BLAKE2b-256 7e01858d3ec132db3a432b42502a3ffa411c792a4e46c2c2e2ad7fd8fba20f41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page