A configurable replacement for `kedro catalog create`.
Project description
Kedro Auto Catalog
A configurable version of the built in kedro catalog create
cli. Default
types can be configured in the projects settings.py, to get these types rather
than MemoryDataSets
.
Table of Contents
Installation
pip install kedro-auto-catalog
Configuration
Configure the project defaults in src/<project_name>/settings.py
with this
dict.
AUTO_CATALOG = {
"directory": "data",
"subdirs": ["raw", "intermediate", "primary"],
"layers": ["raw", "intermediate", "primary"],
"default_extension": "parquet",
"default_type": "pandas.ParquetDataSet",
}
Usage
To auto create catalog entries for the __default__
pipeline, run this from the command line.
kedro auto-catalog -p __default__
If you want a reminder of what to do, use the --help
.
❯ kedro auto-catalog --help❯
Usage: kedro auto-catalog [OPTIONS]
Create Data Catalog YAML configuration with missing datasets.
Add configurable datasets to Data Catalog YAML configuration file for each
dataset in a registered pipeline if it is missing from the `DataCatalog`.
The catalog configuration will be saved to
`<conf_source>/<env>/catalog/<pipeline_name>.yml` file.
Configure the project defaults in `src/<project_name>/settings.py` with this
dict.
Options:
-e, --env TEXT Environment to create Data Catalog YAML file in.
Defaults to `base`.
-p, --pipeline TEXT Name of a pipeline. [required]
-h, --help Show this message and exit.
Example
Using the
kedro-spaceflights
example, running kedro auto-catalog -p __default__
yields the following
catalog in conf/base/catalog/__default__.yml
X_test:
filepath: data/X_test.pq
type: pandas.ParquetDataSet
X_train:
filepath: data/X_train.pq
type: pandas.ParquetDataSet
y_test:
filepath: data/y_test.parquet
type: pandas.ParquetDataSet
y_train:
filepath: data/y_train.parquet
type: pandas.ParquetDataSet
subdirs and layers
If we use the example configuration with "subdirs": ["raw", "intermediate", "primary"]
and "layers": ["raw", "intermediate", "primary"]
, it will convert
any leading subdir/layer in your dataset name into a directory. If we change y_test
to raw_y_test
, it will put y_test.parquet
in the raw
directory, and in the raw layer.
raw_y_test:
filepath: data/raw/y_test.parquet
layer: raw
type: pandas.ParquetDataSet
License
kedro-auto-catalog
is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kedro_auto_catalog-0.2.0.tar.gz
.
File metadata
- Download URL: kedro_auto_catalog-0.2.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4213ddf92671eeed07e54f1aa9408a80517419ae280321965b03d6fae04b591 |
|
MD5 | aa64dd34da9ff47eb31354cf7500204e |
|
BLAKE2b-256 | 5960c56f1d21fa5332515c5cd86da1a6581cc6b95e87955421420becb4747e86 |
File details
Details for the file kedro_auto_catalog-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: kedro_auto_catalog-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64d0045f1102d4b048025cb7dd8d6d9ce9f767d35ebdaf429ef6823f8af1c086 |
|
MD5 | a3250760de63e2a9acafbb68497240d2 |
|
BLAKE2b-256 | cc073cb887ba62e7855e0a2a5621406763d9333b5df610ec074dd94e924ba2ff |