Skip to main content

A library for chunking different types of data files.

Project description

chunkr

PyPI version

GitHub stars Support Python versions

A library for chunking different types of data files.

Getting started

pip install chunkr

Usage

Suppose you want to chunk a csv file of 1 million records into 10 pieces, you can do this

from chunkr import create_chunks_dir
import pandas as pd

with create_chunks_dir(
            'csv',
            'csv_test',
            'path/to/file',
            'temp/output',
            100_000,
            None,
            None,
            quote_char='"',
            delimiter=',',
            escape_char='\\',
    ) as chunks_dir:

        assert 1_000_000 == sum(
            len(pd.read_parquet(file)) for file in chunks_dir.iterdir()
        )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkr-0.1.0.tar.gz (4.5 kB view hashes)

Uploaded Source

Built Distribution

chunkr-0.1.0-py3-none-any.whl (4.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page