A configurable replacement for `kedro catalog create`.

These details have not been verified by PyPI

Project links

Project description

Kedro Auto Catalog

A configurable version of the built in kedro catalog create cli. Default types can be configured in the projects settings.py, to get these types rather than MemoryDataSets.

Table of Contents

Installation
License

Installation

pip install kedro-auto-catalog

Configuration

Configure the project defaults in src/<project_name>/settings.py with this dict.

AUTO_CATALOG = {
    "directory": "data",
    "subdirs": ["raw", "intermediate", "primary"],
    "layers": ["raw", "intermediate", "primary"],
    "default_extension": "parquet",
    "default_type": "pandas.ParquetDataSet",
}

Usage

To auto create catalog entries for the __default__ pipeline, run this from the command line.

kedro auto-catalog -p __default__

If you want a reminder of what to do, use the --help.

❯ kedro auto-catalog --help❯
Usage: kedro auto-catalog [OPTIONS]

  Create Data Catalog YAML configuration with missing datasets.

  Add configurable datasets to Data Catalog YAML configuration file for each
  dataset in a registered pipeline if it is missing from the `DataCatalog`.

  The catalog configuration will be saved to
  `<conf_source>/<env>/catalog/<pipeline_name>.yml` file.

  Configure the project defaults in `src/<project_name>/settings.py` with this
  dict.

Options:
  -e, --env TEXT       Environment to create Data Catalog YAML file in.
                       Defaults to `base`.
  -p, --pipeline TEXT  Name of a pipeline.  [required]
  -h, --help           Show this message and exit.

Example

Using the kedro-spaceflights example, running kedro auto-catalog -p __default__ yields the following catalog in conf/base/catalog/__default__.yml

X_test:
  filepath: data/X_test.pq
  type: pandas.ParquetDataSet
X_train:
  filepath: data/X_train.pq
  type: pandas.ParquetDataSet
y_test:
  filepath: data/y_test.parquet
  type: pandas.ParquetDataSet
y_train:
  filepath: data/y_train.parquet
  type: pandas.ParquetDataSet

subdirs and layers

If we use the example configuration with "subdirs": ["raw", "intermediate", "primary"] and "layers": ["raw", "intermediate", "primary"], it will convert any leading subdir/layer in your dataset name into a directory. If we change y_test to raw_y_test, it will put y_test.parquet in the raw directory, and in the raw layer.

raw_y_test:
  filepath: data/raw/y_test.parquet
  layer: raw
  type: pandas.ParquetDataSet

License

kedro-auto-catalog is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jul 14, 2023

0.2.0.dev0 pre-release

Jul 14, 2023

0.1.1

Feb 21, 2023

0.1.1.dev0 pre-release

Feb 21, 2023

0.1.0

Feb 21, 2023

0.1.0.dev0 pre-release

Feb 21, 2023

0.0.0

Feb 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro_auto_catalog-0.2.0.tar.gz (12.1 kB view details)

Uploaded Jul 14, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kedro_auto_catalog-0.2.0-py3-none-any.whl (6.0 kB view details)

Uploaded Jul 14, 2023 Python 3

File details

Details for the file kedro_auto_catalog-0.2.0.tar.gz.

File metadata

Download URL: kedro_auto_catalog-0.2.0.tar.gz
Upload date: Jul 14, 2023
Size: 12.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.24.1

File hashes

Hashes for kedro_auto_catalog-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c4213ddf92671eeed07e54f1aa9408a80517419ae280321965b03d6fae04b591`
MD5	`aa64dd34da9ff47eb31354cf7500204e`
BLAKE2b-256	`5960c56f1d21fa5332515c5cd86da1a6581cc6b95e87955421420becb4747e86`

See more details on using hashes here.

File details

Details for the file kedro_auto_catalog-0.2.0-py3-none-any.whl.

File metadata

Download URL: kedro_auto_catalog-0.2.0-py3-none-any.whl
Upload date: Jul 14, 2023
Size: 6.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.24.1

File hashes

Hashes for kedro_auto_catalog-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64d0045f1102d4b048025cb7dd8d6d9ce9f767d35ebdaf429ef6823f8af1c086`
MD5	`a3250760de63e2a9acafbb68497240d2`
BLAKE2b-256	`cc073cb887ba62e7855e0a2a5621406763d9333b5df610ec074dd94e924ba2ff`

See more details on using hashes here.

kedro-auto-catalog 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kedro Auto Catalog

Installation

Configuration

Usage

Example

subdirs and layers

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes