Skip to main content

No project description provided

Project description

Kartothek

Build Status Documentation Status codecov.io License: MIT Anaconda-Server Badge Anaconda-Server Badge

Kartothek is a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store. It stores data as datasets, which it presents as pandas DataFrames to the user. Datasets are a collection of files with the same schema that reside in a blob store. Kartothek uses a metadata definition to handle these datasets efficiently. For distributed access and manipulation of datasets Kartothek offers a Dask interface.

Storing data distributed over multiple files in a blob store (S3, ABS, GCS, etc.) allows for a fast, cost-efficient and highly scalable data infrastructure. A downside of storing data solely in an object store is that the storages themselves give little to no guarantees beyond the consistency of a single file. In particular, they cannot guarantee the consistency of your dataset. If we demand a consistent state of our dataset at all times, we need to track the state of the dataset. Kartothek frees us from having to do this manually.

The kartothek.io module provides building blocks to create and modify these datasets in data pipelines. Kartothek handles I/O, tracks dataset partitions and selects subsets of data transparently.

Installation

Installers for the latest released version are availabe at the Python package index and on conda.

# Install with pip
pip install kartothek
# Install with conda
conda install -c conda-forge kartothek

What is a (real) Kartothek?

A Kartothek (or more modern: Zettelkasten/Katalogkasten) is a tool to organize (high-level) information extracted from a source of information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kartothek-4.0.0.tar.gz (952.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kartothek-4.0.0-py3-none-any.whl (225.3 kB view details)

Uploaded Python 3

File details

Details for the file kartothek-4.0.0.tar.gz.

File metadata

  • Download URL: kartothek-4.0.0.tar.gz
  • Upload date:
  • Size: 952.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for kartothek-4.0.0.tar.gz
Algorithm Hash digest
SHA256 c352e9884fbbdbedab359987f9c854e885aa99f1d8bb5dee53a69dc9e8996953
MD5 2bd36d8e6ebe4af56f4085b699126463
BLAKE2b-256 15424cfd80c7c2799a6b6bee9bb0d9cc7bec642d468d9e2cb46d4cb4a48d1bf6

See more details on using hashes here.

File details

Details for the file kartothek-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: kartothek-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 225.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for kartothek-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad21a08d888305a893d59269d90ebedf284e379ff10b95f2827e90bf6eb70d3c
MD5 33a000dbb3a5f9efae4f7fcd0b2beae3
BLAKE2b-256 67c48bc24e24a9469793ede2967a04c0cfb2798824f9270512e158600e8c0709

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page