Skip to main content

A configurable replacement for `kedro catalog create`.

Project description

Kedro Auto Catalog

A configurable version of the built in kedro catalog create cli. Default types can be configured in the projects settings.py, to get these types rather than MemoryDataSets.

PyPI - Version PyPI - Python Version


Table of Contents

Installation

pip install kedro-auto-catalog

Configuration

Configure the project defaults in src/<project_name>/settings.py with this dict.

AUTO_CATALOG = {
    "directory": "data",
    "subdirs": ["raw", "intermediate", "primary"],
    "layers": ["raw", "intermediate", "primary"],
    "default_extension": "parquet",
    "default_type": "pandas.ParquetDataSet",
}

Usage

To auto create catalog entries for the __default__ pipeline, run this from the command line.

kedro auto-catalog -p __default__

If you want a reminder of what to do, use the --help.

 kedro auto-catalog --help❯
Usage: kedro auto-catalog [OPTIONS]

  Create Data Catalog YAML configuration with missing datasets.

  Add configurable datasets to Data Catalog YAML configuration file for each
  dataset in a registered pipeline if it is missing from the `DataCatalog`.

  The catalog configuration will be saved to
  `<conf_source>/<env>/catalog/<pipeline_name>.yml` file.

  Configure the project defaults in `src/<project_name>/settings.py` with this
  dict.

Options:
  -e, --env TEXT       Environment to create Data Catalog YAML file in.
                       Defaults to `base`.
  -p, --pipeline TEXT  Name of a pipeline.  [required]
  -h, --help           Show this message and exit.

Example

Using the kedro-spaceflights example, running kedro auto-catalog -p __default__ yields the following catalog in conf/base/catalog/__default__.yml

X_test:
  filepath: data/X_test.pq
  type: pandas.ParquetDataSet
X_train:
  filepath: data/X_train.pq
  type: pandas.ParquetDataSet
y_test:
  filepath: data/y_test.parquet
  type: pandas.ParquetDataSet
y_train:
  filepath: data/y_train.parquet
  type: pandas.ParquetDataSet

subdirs and layers

If we use the example configuration with "subdirs": ["raw", "intermediate", "primary"] and "layers": ["raw", "intermediate", "primary"], it will convert any leading subdir/layer in your dataset name into a directory. If we change y_test to raw_y_test, it will put y_test.parquet in the raw directory, and in the raw layer.

raw_y_test:
  filepath: data/raw/y_test.parquet
  layer: raw
  type: pandas.ParquetDataSet

License

kedro-auto-catalog is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro_auto_catalog-0.2.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

kedro_auto_catalog-0.2.0-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file kedro_auto_catalog-0.2.0.tar.gz.

File metadata

  • Download URL: kedro_auto_catalog-0.2.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.24.1

File hashes

Hashes for kedro_auto_catalog-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c4213ddf92671eeed07e54f1aa9408a80517419ae280321965b03d6fae04b591
MD5 aa64dd34da9ff47eb31354cf7500204e
BLAKE2b-256 5960c56f1d21fa5332515c5cd86da1a6581cc6b95e87955421420becb4747e86

See more details on using hashes here.

File details

Details for the file kedro_auto_catalog-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for kedro_auto_catalog-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64d0045f1102d4b048025cb7dd8d6d9ce9f767d35ebdaf429ef6823f8af1c086
MD5 a3250760de63e2a9acafbb68497240d2
BLAKE2b-256 cc073cb887ba62e7855e0a2a5621406763d9333b5df610ec074dd94e924ba2ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page