Simplified reading & writing files with support for compression

These details have not been verified by PyPI

Project links

Project description

rwkit

rwkit is a Python package that simplifies reading and writing various file formats, including text, json, jsonl and yaml. It supports transparent handling of compression, and allows for processing large files in chunks.

Features

Easy-to-use functions for reading and writing text, json, jsonl and yaml files.
Transparent compression support: bz2, gzip, tar, tar.bz2, tar.gz, tar.xz, xz, zip, zstd.
Generator functions for processing large files in chunks.

Installation

Install rwkit using pip:

pip install rwkit

Optional Dependencies

rwkit comes with optional features that you can install based on your needs:

pip install rwkit[zstd]  # For Zstandard compression support
pip install rwkit[yaml]  # For YAML file handling
pip install rwkit[all]   # For all optional features

Quick Start

Here are some examples to get you started:

Reading and Writing Text Files

Using a single string:

import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write a string
rw.write_text("file.txt", text)

# Append another string
rw.write_text("file.txt", "\nNice to meet you.", mode="a")

# Read file
loaded_text = rw.read_text("file.txt")

print(loaded_text)
# Output: 'Hello, rwkit!\nNice to meet you.'

... using lines (= list of strings):

import rwkit as rw


# Sample
lines = ["Hello, rwkit!", "Nice to meet you."]

# Write lines, each element on its own line (separated by '\n')
rw.write_lines("file.txt", lines)

# Append a line(s)
rw.write_lines("file.txt", "What a beautiful day.", mode="a")

# Read file (transparently splits on '\n')
loaded_lines = rw.read_lines("file.txt")

print(loaded_lines)
# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']

Reading and Writing JSON Files

Using a single object:

import rwkit as rw


# Sample data
data = {"name": "Alice", "age": 25}

# Write data to a JSON file
rw.write_json("file.json", data)

# Read data
loaded_data = rw.read_json("file.json")

print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}

Reading and Writing JSONL (= JSON Lines) Files

Using multiple objects, each on their own line. This format is especially useful for large files that are processed in chunks (see also below).

import rwkit as rw


# Sample data
data = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30},
]

# Write data to a JSONL file
rw.write_jsonl("file.jsonl", data)

# Read data
loaded_data = rw.read_jsonl("file.jsonl")

print(loaded_data)
# Output: [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]

Reading and Writing YAML Files

Note: Requires pyyaml package.

import rwkit as rw


# Sample data
data = {"name": "Alice", "age": 25}

# Write to a YAML file
rw.write_yaml("file.yaml", data)

# Read a YAML file
loaded_data = rw.read_yaml("file.yaml")

print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}

Compression

rwkit supports various compression formats via argument compression. The default is compression='infer', which tries to infer it from the filename extension:

import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write to a gzip compressed text file, inferred from the filename extension
rw.write_text("file.txt.gz", text)

# Read a gzip compressed text file
loaded_text = rw.read_text("file.txt.gz")

print(loaded_text)
# Output: 'Hello, rwkit!'

Alternatively, specify compression explicitly (see all available options in table below):

import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write to a gzip compressed text file, explicitly specified
rw.write_text("file.txt.gz", text, compression="gzip")

# Read a gzip compressed text file, explicitly specified
loaded_text = rw.read_text("file.txt.gz", compression="gzip")

print(loaded_text)
# Output: 'Hello, rwkit!'

When compression='infer', the following rules apply:

File extension	Inferred compression
`.tar`	`tar`
`.tar.bz2`	`tar.bz2`
`.tar.gz`	`tar.gz`
`.tar.xz`	`tar.xz`
`.bz2`	`bz2`
`.gz`	`gzip`
`.xz`	`xz`
`.zip`	`zip`
`.zst`	`zstd`
[everything else]	None

Reading Large Files in Chunks

Both text and jsonl files can be read in chunks using the chunksize argument. This also works in combination with compression.

import rwkit as rw


# Assume a large text file, optionally compressed
for chunk in rw.read_lines("file.txt", chunksize=3):
    print(chunk)
    # Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
    # ...

# The same works for jsonl files
for chunk in rw.read_jsonl("file.jsonl", chunksize=3):
    print(chunk)
    # Output: [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]
    # ...

License

rwkit is released under the Apache License Version 2.0. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Aug 31, 2024

1.0.2

Aug 25, 2024

1.0.1

Aug 24, 2024

1.0.0

Aug 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rwkit-2.0.0.tar.gz (12.9 kB view hashes)

Uploaded Aug 31, 2024 Source

Built Distribution

rwkit-2.0.0-py3-none-any.whl (15.1 kB view hashes)

Uploaded Aug 31, 2024 Python 3

Hashes for rwkit-2.0.0.tar.gz

Hashes for rwkit-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`0c56550f18a4158ed2d4d84702264954f47476ac69100d2d99dd38a980d80bba`
MD5	`309369d5c5470bc5ed88d03a99300a9a`
BLAKE2b-256	`e9a463ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581`

Hashes for rwkit-2.0.0-py3-none-any.whl

Hashes for rwkit-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79ca7053ba906a75b034894b70647057832ab478410ec58602af3d61ffa478b9`
MD5	`d830ccb07c18a37a5c48cd67785daa39`
BLAKE2b-256	`062ace0a79b2d16aa01c3a687c8f703a9ae4388d738fd6cf82f008dad0fd3241`