Skip to main content

Python module for processing csv files in chunks.

Project description

chunksv

Python wrapper for csv.reader that can process files in predefined chunks.

purpose

This library allows a user to partition a filestream into partitions of a predefined size. It was initially motivated by the need to process large CSV files from AWS S3 while keeping application code clean.

package installation and usage

The package is available on PyPI:

python -m pip install chunksv

The library can be imported and used as follows:

import chunksv

with open("file.csv", "r") as f:
    rows = chunksv.reader(
        f, 
        max_bytes=<size of each partition>, 
        header=[<optional columns list>]
    )

When the reader object has consumed enough rows to reach the max_bytes limit, it will raise StopIteration. To consume more rows from the input stream, call reader.resume():

while not rows.empty:
    current_partition = [r for r in rows]
    < process partition here >
    rows.resume()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunksv-0.1.0.tar.gz (3.6 kB view details)

Uploaded Source

File details

Details for the file chunksv-0.1.0.tar.gz.

File metadata

  • Download URL: chunksv-0.1.0.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.9.0

File hashes

Hashes for chunksv-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2f5bb735948cd70d772f87fec8d7497e9af3668cea4be0e9ed5af14aeba99aea
MD5 0bfaa75c97fb0e50305df0f25ec4f11a
BLAKE2b-256 a73ce3b27b119680079d44116c479ccdbb5f48b4fe296e37c2fa29f5c080ed08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page