Python module for processing csv files in chunks.
Project description
chunksv
Python wrapper for csv.reader
that can process files in predefined chunks.
purpose
This library allows a user to partition a filestream into partitions of a predefined size. It was initially motivated by the need to process large CSV files from AWS S3 while keeping application code clean.
package installation and usage
The package is available on PyPI:
python -m pip install chunksv
The library can be imported and used as follows:
import chunksv
with open("file.csv", "r") as f:
rows = chunksv.reader(
f,
max_bytes=<size of each partition>,
header=[<optional columns list>]
)
When the reader
object has consumed enough rows to reach the max_bytes
limit, it will raise StopIteration
. To
consume more rows from the input stream, call reader.resume()
:
while not rows.empty:
current_partition = [r for r in rows]
< process partition here >
rows.resume()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chunksv-0.1.0.tar.gz
(3.6 kB
view details)
File details
Details for the file chunksv-0.1.0.tar.gz
.
File metadata
- Download URL: chunksv-0.1.0.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f5bb735948cd70d772f87fec8d7497e9af3668cea4be0e9ed5af14aeba99aea |
|
MD5 | 0bfaa75c97fb0e50305df0f25ec4f11a |
|
BLAKE2b-256 | a73ce3b27b119680079d44116c479ccdbb5f48b4fe296e37c2fa29f5c080ed08 |