iterabledata

Iterable data processing Python library

These details have not been verified by PyPI

Project links

Project description

Work in progress. Documentation in progress

Iterable data is a Python lib to read data files row by row and write data files. Iterable classes are similar to files or csv.DictReader or reading parquet files row by row.

This library was written to simplify data processing and conversion between formats.

Supported file types: * BSON * JSON * NDJSON (JSON lines) * XML * XLS * XLSX * Parquet * ORC * Avro * Pickle

Supported file compression: GZip, BZip2, LZMA (.xz), LZ4, ZIP, Brotli, ZStandard

Why writing this lib?

Python has many high-quality data processing tools and libraries, especially pandas and other data frames lib. The only issue with most of them is flat data. Data frames don’t support complex data types, and you must flatten data each time.

pyiterable helps you read any data as a Python dictionary instead of flattening data. It makes it much easier to work with such data sources as JSON, NDJSON, or BSON files.

This code is used in several tools written by its author. It’s command line tool undatum and data processing ETL engine datacrafter

Requirements

Python 3.8+

Installation

pip install iterabledata or use this repository

Documentation

In progress. Please see usage and examples.

Usage and examples

Read compressed CSV file

Read compressed csv.xz file

```{python}

from iterable.helpers.detect import open_iterable

source = open_iterable(‘data.csv.xz’) n = 0 for row in iterable: n += 1 # Add data processing code here if n % 1000 == 0: print(‘Processing %d’ % (n))

### Detect encoding and file delimiter

Detects encoding and delimiter of the selected CSV file and use it to open as iterable

```{python}

from iterable.helpers.detect import open_iterable
from iterable.helpers.utils import detect_encoding, detect_delimiter

delimiter = detect_delimiter('data.csv')
encoding = detect_encoding('data.csv')

source = open_iterable('data.csv', iterableargs={'encoding' : encoding['encoding'], 'delimiter' : delimiter)
n = 0
for row in iterable:
    n += 1
    # Add data processing code here
    if n % 1000 == 0: print('Processing %d' % (n))

Convert Parquet file to BSON compressed with LZMA using pipeline

Uses pipeline class to iterate through parquet file and convert its selected fields to JSON lines (NDJSON)

```{python}

from iterable.helpers.detect import open_iterable from iterable.pipeline import pipeline

source = open_iterable(‘data/data.parquet’) destination = open_iterable(‘data/data.jsonl.xz’, mode=‘w’)

def extract_fields(record, state): out = {} record = dict(record) print(record) for k in [‘name’,]: out[k] = record[k] return out

def print_process(stats, state): print(stats)

pipeline(source, destination=destination, process_func=extract_fields, trigger_on=2, trigger_func=print_process, final_func=print_process, start_state={})

### Convert gzipped JSON lines (NDJSON) file to BSON compressed with LZMA

Reads each row from JSON lines file using Gzip codec and writes BSON data using LZMA codec

```{python}

from iterable.datatypes import JSONLinesIterable, BSONIterable
from iterable.codecs import GZIPCodec, LZMACodec


codecobj = GZIPCodec('data.jsonl.gz', mode='r', open_it=True)
iterable = JSONLinesIterable(codec=codecobj)
codecobj = LZMACodec('data.bson.xz', mode='wb', open_it=False)
write_iterable = BSONIterable(codec=codecobj, mode='w')
n = 0
for row in iterable:
    n += 1
    if n % 10000 == 0: print('Processing %d' % (n))
    write_iterable.write(row)

More examples and tests

See tests for example usage and tests

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.11

Jan 25, 2026

1.0.10

Jan 13, 2026

1.0.9

Jan 12, 2026

1.0.8

Dec 20, 2025

1.0.7

Dec 15, 2025

This version

1.0.6

Nov 1, 2025

1.0.5

Jun 14, 2024

1.0.4

Jun 14, 2024

1.0.2

Dec 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iterabledata-1.0.6.tar.gz (32.5 kB view details)

Uploaded Nov 1, 2025 Source

File details

Details for the file iterabledata-1.0.6.tar.gz.

File metadata

Download URL: iterabledata-1.0.6.tar.gz
Upload date: Nov 1, 2025
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for iterabledata-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`02c9bcfe9e3341c22cc30a35c89f3ce85838d53e516fe09eb4510dd14985f7c4`
MD5	`841fdbb0009e85a0c0ed585f6981221d`
BLAKE2b-256	`817065caeeae418f5279e1c80d90db08779b053f238ff218b973e3f97cee2a8b`

See more details on using hashes here.

iterabledata 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Why writing this lib?

Requirements

Installation

Documentation

Usage and examples

Read compressed CSV file

Convert Parquet file to BSON compressed with LZMA using pipeline

More examples and tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes