Simplified reading & writing files with support for compression
Project description
rwkit
rwkit
is a Python package that simplifies reading and writing various file formats, including text, json, jsonl and yaml. It supports transparent handling of compression, and allows for processing large files in chunks.
Features
- Easy-to-use functions for reading and writing text, json, jsonl and yaml files.
- Transparent compression support: bz2, gzip, tar, tar.bz2, tar.gz, tar.xz, xz, zip, zstd.
- Generator functions for processing large files in chunks.
Installation
Install rwkit
using pip:
pip install rwkit
Optional Dependencies
rwkit
comes with optional features that you can install based on your needs:
pip install rwkit[zstd] # For Zstandard compression support
pip install rwkit[yaml] # For YAML file handling
pip install rwkit[all] # For all optional features
Quick Start
Here are some examples to get you started:
Reading and Writing Text Files
Using a single string:
import rwkit as rw
# Sample text
text = "Hello, rwkit!"
# Write a string
rw.write_text("file.txt", text)
# Append another string
rw.write_text("file.txt", "\nNice to meet you.", mode="a")
# Read file
loaded_text = rw.read_text("file.txt")
print(loaded_text)
# Output: 'Hello, rwkit!\nNice to meet you.'
... using lines (= list of strings):
import rwkit as rw
# Sample
lines = ["Hello, rwkit!", "Nice to meet you."]
# Write lines, each element on its own line (separated by '\n')
rw.write_lines("file.txt", lines)
# Append a line(s)
rw.write_lines("file.txt", "What a beautiful day.", mode="a")
# Read file (transparently splits on '\n')
loaded_lines = rw.read_lines("file.txt")
print(loaded_lines)
# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
Reading and Writing JSON Files
Using a single object:
import rwkit as rw
# Sample data
data = {"name": "Alice", "age": 25}
# Write data to a JSON file
rw.write_json("file.json", data)
# Read data
loaded_data = rw.read_json("file.json")
print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}
Reading and Writing JSONL (= JSON Lines) Files
Using multiple objects, each on their own line. This format is especially useful for large files that are processed in chunks (see also below).
import rwkit as rw
# Sample data
data = [
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30},
]
# Write data to a JSONL file
rw.write_jsonl("file.jsonl", data)
# Read data
loaded_data = rw.read_jsonl("file.jsonl")
print(loaded_data)
# Output: [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]
Reading and Writing YAML Files
Note: Requires pyyaml
package.
import rwkit as rw
# Sample data
data = {"name": "Alice", "age": 25}
# Write to a YAML file
rw.write_yaml("file.yaml", data)
# Read a YAML file
loaded_data = rw.read_yaml("file.yaml")
print(loaded_data)
# Output: {'name': 'Alice', 'age': 25}
Compression
rwkit
supports various compression formats via argument compression
. The default is compression='infer'
, which tries to infer it from the filename extension:
import rwkit as rw
# Sample text
text = "Hello, rwkit!"
# Write to a gzip compressed text file, inferred from the filename extension
rw.write_text("file.txt.gz", text)
# Read a gzip compressed text file
loaded_text = rw.read_text("file.txt.gz")
print(loaded_text)
# Output: 'Hello, rwkit!'
Alternatively, specify compression
explicitly (see all available options in table
below):
import rwkit as rw
# Sample text
text = "Hello, rwkit!"
# Write to a gzip compressed text file, explicitly specified
rw.write_text("file.txt.gz", text, compression="gzip")
# Read a gzip compressed text file, explicitly specified
loaded_text = rw.read_text("file.txt.gz", compression="gzip")
print(loaded_text)
# Output: 'Hello, rwkit!'
When compression='infer'
, the following rules apply:
File extension | Inferred compression |
---|---|
.tar |
tar |
.tar.bz2 |
tar.bz2 |
.tar.gz |
tar.gz |
.tar.xz |
tar.xz |
.bz2 |
bz2 |
.gz |
gzip |
.xz |
xz |
.zip |
zip |
.zst |
zstd |
[everything else] | None |
Reading Large Files in Chunks
Both text and jsonl files can be read in chunks using the chunksize
argument. This
also works in combination with compression
.
import rwkit as rw
# Assume a large text file, optionally compressed
for chunk in rw.read_lines("file.txt", chunksize=3):
print(chunk)
# Output: ['Hello, rwkit!', 'Nice to meet you.', 'What a beautiful day.']
# ...
# The same works for jsonl files
for chunk in rw.read_jsonl("file.jsonl", chunksize=3):
print(chunk)
# Output: [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]
# ...
License
rwkit
is released under the Apache License Version 2.0. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rwkit-2.0.0.tar.gz
.
File metadata
- Download URL: rwkit-2.0.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c56550f18a4158ed2d4d84702264954f47476ac69100d2d99dd38a980d80bba |
|
MD5 | 309369d5c5470bc5ed88d03a99300a9a |
|
BLAKE2b-256 | e9a463ce23029cbb938f51aed3ec7b67871849497e79a7f6098eeb26b5545581 |
File details
Details for the file rwkit-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: rwkit-2.0.0-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79ca7053ba906a75b034894b70647057832ab478410ec58602af3d61ffa478b9 |
|
MD5 | d830ccb07c18a37a5c48cd67785daa39 |
|
BLAKE2b-256 | 062ace0a79b2d16aa01c3a687c8f703a9ae4388d738fd6cf82f008dad0fd3241 |