Skip to main content

Simplified reading & writing files with support for compression

Project description

rwkit

rwkit is a Python package that simplifies reading and writing various file formats, including text, json, jsonl and yaml. It supports transparent handling of compression, and allows for processing large files in chunks.

Features

  • Easy-to-use functions for reading and writing text, json, jsonl and yaml files.
  • Transparent compression support: bz2, gzip, tar, tar.bz2, tar.gz, tar.xz, xz, zip, zstd.
  • Generator functions for processing large files in chunks.

Installation

Install rwkit using pip:

pip install rwkit

Optional Dependencies

rwkit comes with optional features that you can install based on your needs:

pip install rwkit[zstd]  # For Zstandard compression support
pip install rwkit[yaml]  # For YAML file handling
pip install rwkit[all]   # For all optional features

Quick Start

Here are some examples to get you started:

Reading and Writing Text Files

Using a single string:

import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write a string
rw.write_text("file.txt", text)

# Append another string
rw.write_text("file.txt", "\nNice to meet you.", mode="a")

# Read file
loaded_text = rw.read_text("file.txt")

print(loaded_text)
# Output: 'Hello, rwkit!\nNice to meet you.'

... using lines (= list of strings):

import rwkit as rw


# Sample
lines = ["Hello, rwkit!", "Nice to meet you."]

# Write lines, each element on its own line (separated by '\n')
rw.write_lines("file.txt", lines)

# Append a line(s)
rw.write_lines("file.txt", "What a beautiful day.", mode="a")

# Read file (transparently splits on '\n')
loaded_lines = rw.read_lines("file.txt")

print(loaded_lines)
# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']

Reading and Writing JSON Files

Using a single object:

import rwkit as rw


# Sample data
data = {"name": "Alice", "age": 25}

# Write data to a JSON file
rw.write_json("file.json", data)

# Read data
loaded_data = rw.read_json("file.json")

print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}

Reading and Writing JSONL (= JSON Lines) Files

Using multiple objects, each on their own line. This format is especially useful for large files that are processed in chunks (see also below).

import rwkit as rw


# Sample data
data = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30},
]

# Write data to a JSONL file
rw.write_jsonl("file.jsonl", data)

# Read data
loaded_data = rw.read_jsonl("file.jsonl")

print(loaded_data)
# Output: [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]

Reading and Writing YAML Files

Note: Requires pyyaml package.

import rwkit as rw


# Sample data
data = {"name": "Alice", "age": 25}

# Write to a YAML file
rw.write_yaml("file.yaml", data)

# Read a YAML file
loaded_data = rw.read_yaml("file.yaml")

print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}

Compression

rwkit supports various compression formats via argument compression. The default is compression='infer', which tries to infer it from the filename extension:

import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write to a gzip compressed text file, inferred from the filename extension
rw.write_text("file.txt.gz", text)

# Read a gzip compressed text file
loaded_text = rw.read_text("file.txt.gz")

print(loaded_text)
# Output: 'Hello, rwkit!'

Alternatively, specify compression explicitly (see all available options in table below):

import rwkit as rw


# Sample text
text = "Hello, rwkit!"

# Write to a gzip compressed text file, explicitly specified
rw.write_text("file.txt.gz", text, compression="gzip")

# Read a gzip compressed text file, explicitly specified
loaded_text = rw.read_text("file.txt.gz", compression="gzip")

print(loaded_text)
# Output: 'Hello, rwkit!'

When compression='infer', the following rules apply:

File extension Inferred compression
.tar tar
.tar.bz2 tar.bz2
.tar.gz tar.gz
.tar.xz tar.xz
.bz2 bz2
.gz gzip
.xz xz
.zip zip
.zst zstd
[everything else] None

Reading Large Files in Chunks

Both text and jsonl files can be read in chunks using the chunksize argument. This also works in combination with compression.

import rwkit as rw


# Assume a large text file, optionally compressed
for chunk in rw.read_lines("file.txt", chunksize=3):
    print(chunk)
    # Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
    # ...

# The same works for jsonl files
for chunk in rw.read_jsonl("file.jsonl", chunksize=3):
    print(chunk)
    # Output: [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]
    # ...

License

rwkit is released under the Apache License Version 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rwkit-2.0.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

rwkit-2.0.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file rwkit-2.0.0.tar.gz.

File metadata

  • Download URL: rwkit-2.0.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for rwkit-2.0.0.tar.gz
Algorithm Hash digest
SHA256 0c56550f18a4158ed2d4d84702264954f47476ac69100d2d99dd38a980d80bba
MD5 309369d5c5470bc5ed88d03a99300a9a
BLAKE2b-256 e9a463ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581

See more details on using hashes here.

File details

Details for the file rwkit-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: rwkit-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for rwkit-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79ca7053ba906a75b034894b70647057832ab478410ec58602af3d61ffa478b9
MD5 d830ccb07c18a37a5c48cd67785daa39
BLAKE2b-256 062ace0a79b2d16aa01c3a687c8f703a9ae4388d738fd6cf82f008dad0fd3241

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page