Skip to main content

Kedro-Datasets is where you can find all of Kedro's data connectors.

Project description

Kedro-Datasets

License Python Version PyPI Version Code Style: Black

Welcome to kedro_datasets, the home of Kedro's data connectors. Here you will find AbstractDataset implementations powering Kedro's DataCatalog created by QuantumBlack and external contributors.

Installation

kedro-datasets is a Python plugin. To install it:

pip install kedro-datasets

Install dependencies at a group-level

Datasets are organised into groups e.g. pandas, spark and pickle. Each group has a collection of datasets, e.g.pandas.CSVDataset, pandas.ParquetDataset and more. You can install dependencies for an entire group of dependencies as follows:

pip install "kedro-datasets[<group>]"

This installs Kedro-Datasets and dependencies related to the dataset group. An example of this could be a workflow that depends on the data types in pandas. Run pip install 'kedro-datasets[pandas]' to install Kedro-Datasets and the dependencies for the datasets in the pandas group.

Install dependencies at a type-level

To limit installation to dependencies specific to a dataset:

pip install "kedro-datasets[<group>-<dataset>]"

For example, your workflow might require the pandas.ExcelDataset, so to install its dependencies, run pip install "kedro-datasets[pandas-exceldataset]".

From `kedro-datasets` version 3.0.0 onwards, the names of the optional dataset-level dependencies have been normalised to follow [PEP 685](https://peps.python.org/pep-0685/). The '.' character has been replaced with a '-' character and the names are in lowercase. For example, if you had `kedro-datasets[pandas.ExcelDataset]` in your requirements file, it would have to be changed to `kedro-datasets[pandas-exceldataset]`.

What AbstractDataset implementations are supported?

We support a range of data connectors, including CSV, Excel, Parquet, Feather, HDF5, JSON, Pickle, SQL Tables, SQL Queries, Spark DataFrames and more. We even allow support for working with images.

These data connectors are supported with the APIs of pandas, spark, networkx, matplotlib, yaml and more.

The Data Catalog allows you to work with a range of file formats on local file systems, network file systems, cloud object stores, and Hadoop.

Here is a full list of supported data connectors and APIs.

How can I create my own AbstractDataset implementation?

Take a look at our instructions on how to create your own AbstractDataset implementation.

Can I contribute?

Yes! Want to help build Kedro-Datasets? Check out our guide to contributing.

What licence do you use?

Kedro-Datasets is licensed under the Apache 2.0 License.

Python version support policy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro_datasets-9.2.0.tar.gz (193.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kedro_datasets-9.2.0-py3-none-any.whl (310.2 kB view details)

Uploaded Python 3

File details

Details for the file kedro_datasets-9.2.0.tar.gz.

File metadata

  • Download URL: kedro_datasets-9.2.0.tar.gz
  • Upload date:
  • Size: 193.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kedro_datasets-9.2.0.tar.gz
Algorithm Hash digest
SHA256 32f364d61d211add1187ddcfb4466284b90df6027560c89cd9232c8c28e9f645
MD5 1f5bd3f023058f907532a7a2c0b9a5e2
BLAKE2b-256 cd2791426a30417500e4a4d4d0c615185282c1421f5ec359675080b313809535

See more details on using hashes here.

File details

Details for the file kedro_datasets-9.2.0-py3-none-any.whl.

File metadata

  • Download URL: kedro_datasets-9.2.0-py3-none-any.whl
  • Upload date:
  • Size: 310.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kedro_datasets-9.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3309c3a21c25d87f055714d0a9fc3232f45803bb5e3b3c384d403acb9971ee29
MD5 2ae816aaaad8a30a7c219581151142de
BLAKE2b-256 d28d6c774d512a4134150a9568d5b59c65530b4d71c7f57d651a10b0b10b7b2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page