A lightweight, high-performance Python library for parsing jsonl files.
Project description
orjsonl
orjsonl
is a lightweight, high-performance Python library for parsing jsonl files. It supports a wide variety of compression formats, including gzip, bzip2, xz and Zstandard. It is powered by orjson
, the fastest and most accurate json serializer for Python.
Installation
orjsonl
may be installed with pip
:
pip install orjsonl
To read or write Zstandard files, install either zstd
or the zstandard
Python package.
Usage
The code snippet below demonstrates how jsonl files can be saved, loaded, streamed, appended and extended with orjsonl
:
>>> import orjsonl
>>> # Create an iterable of Python objects.
>>> data = [
'hello world',
['fizz', 'buzz'],
]
>>> # Save the iterable to a jsonl file.
>>> orjsonl.save('test.jsonl', data)
>>> # Append a Python object to the jsonl file.
>>> orjsonl.append('test.jsonl', {42 : 3.14})
>>> # Extend the jsonl file with an iterable of Python objects.
>>> orjsonl.extend('test.jsonl', [True, False])
>>> # Load the jsonl file.
>>> orjsonl.load('test.jsonl')
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]
>>> # Stream the jsonl file.
>>> list(orjsonl.stream('test.jsonl'))
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]
orjsonl
can also be used to process jsonl files compressed with gzip, bzip2, xz and Zstandard:
>>> orjsonl.save('test.jsonl.gz', data)
>>> orjsonl.append('test.jsonl.gz', {42 : 3.14})
>>> orjsonl.extend('test.jsonl.gz', [True, False])
>>> orjsonl.load('test.jsonl.gz')
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]
>>> [obj for obj in orjsonl.stream('test.jsonl.gz')]
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]
Load
def load(
path: str | bytes | int | os.PathLike,
decompression_threads: Optional[int] = None,
compression_format: Optional[str] = None
) -> list[dict | list | int | float | str | bool | None]
load()
deserializes a compressed or uncompressed UTF-8-encoded jsonl file to a list of Python objects.
path
is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized.
decompression_threads
is an optional integer passed to xopen.xopen()
as the threads
argument that specifies the number of threads that should be used for decompression.
compression_format
is an optional string passed to xopen.xopen()
as the format
argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.
This function returns a list
object comprised of deserialized dict
, list
, int
, float
, str
, bool
or None
objects.
Stream
def stream(
path: str | bytes | int | os.PathLike,
decompression_threads: Optional[int] = None,
compression_format: Optional[str] = None
) -> Generator[dict | list | int | float | str | bool | None, None, None]
stream()
creates a generator
that deserializes a compressed or uncompressed UTF-8-encoded jsonl file to Python objects.
path
is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized by the generator
.
decompression_threads
is an optional integer passed to xopen.xopen()
as the threads
argument that specifies the number of threads that should be used for decompression.
compression_format
is an optional string passed to xopen.xopen()
as the format
argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.
This function returns a generator
that deserializes the file to dict
, list
, int
, float
, str
, bool
or None
objects.
Save
def save(
path: str | bytes | int | os.PathLike,
data: Iterable,
default: Optional[Callable] = None,
option: int = 0,
compression_level: Optional[int] = None,
compression_threads: Optional[int] = None,
compression_format: Optional[str] = None
) -> None
save()
serializes an iterable of Python objects to a compressed or uncompressed UTF-8-encoded jsonl file.
path
is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be saved.
data
is an iterable of Python objects to be serialized to the file.
default
is an optional callable passed to orjson.dumps()
as the default
argument that serializes subclasses or arbitrary types to supported types.
option
is an optional integer passed to orjson.dumps()
as the option
argument that modifies how data is serialized.
compression_level
is an optional integer passed to xopen.xopen()
as the compresslevel
argument that determines the compression level for writing to gzip, xz and Zstandard files.
compression_threads
is an optional integer passed to xopen.xopen()
as the threads
argument that specifies the number of threads that should be used for compression.
compression_format
is an optional string passed to xopen.xopen()
as the format
argument that overrides the autodetection of the file’s compression format based on its extension. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.
Append
def append(
path: str | bytes | int | os.PathLike,
data: Any,
newline: bool = True,
default: Optional[Callable] = None,
option: int = 0,
compression_level: Optional[int] = None,
compression_threads: Optional[int] = None,
compression_format: Optional[str] = None
) -> None
append()
serializes and appends a Python object to a compressed or uncompressed UTF-8-encoded jsonl file.
path
is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be appended.
data
is a Python object to be serialized and appended to the file.
newline
is an optional Boolean flag that, if set to False
, indicates that the file does not end with a newline and should, therefore, have one added before data is appended.
default
is an optional callable passed to orjson.dumps()
as the default
argument that serializes subclasses or arbitrary types to supported types.
option
is an optional integer passed to orjson.dumps()
as the option
argument that modifies how data is serialized.
compression_level
is an optional integer passed to xopen.xopen()
as the compresslevel
argument that determines the compression level for writing to gzip, xz and Zstandard files.
compression_threads
is an optional integer passed to xopen.xopen()
as the threads
argument that specifies the number of threads that should be used for compression.
compression_format
is an optional string passed to xopen.xopen()
as the format
argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.
Extend
def extend(
path: str | bytes | int | os.PathLike,
data: Iterable,
newline: bool = True,
default: Optional[Callable] = None,
option: int = 0,
compression_level: Optional[int] = None,
compression_threads: Optional[int] = None,
compression_format: Optional[str] = None
) -> None
extend()
serializes and appends an iterable of Python objects to a compressed or uncompressed UTF-8-encoded jsonl file.
path
is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be extended.
data
is an iterable of Python objects to be serialized and appended to the file.
newline
is an optional Boolean flag that, if set to False
, indicates that the file does not end with a newline and should, therefore, have one added before data is extended.
default
is an optional callable passed to orjson.dumps()
as the default
argument that serializes subclasses or arbitrary types to supported types.
option
is an optional integer passed to orjson.dumps()
as the option
argument that modifies how data is serialized.
compression_level
is an optional integer passed to xopen.xopen()
as the compresslevel
argument that determines the compression level for writing to gzip, xz and Zstandard files.
compression_threads
is an optional integer passed to xopen.xopen()
as the threads
argument that specifies the number of threads that should be used for compression.
compression_format
is an optional string passed to xopen.xopen()
as the format
argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.
License
This library is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file orjsonl-1.0.0.tar.gz
.
File metadata
- Download URL: orjsonl-1.0.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5097e7d099a0700a173dbabd90285245442b7940e52386b1470ca1678125e763 |
|
MD5 | 753d0c0616707ece922be2cc1856cca8 |
|
BLAKE2b-256 | a86eaba71f429cdfc141789afa9ea64d86b4277200a7efd4a3e8a025394178bc |
File details
Details for the file orjsonl-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: orjsonl-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21f5688517a34ae77cd919dac63e11eb103b75da9be60ea910ac4f6862d92a47 |
|
MD5 | 8743049a77cdae0ae8436b69b9d3af01 |
|
BLAKE2b-256 | 029f4c2899436bcc758b7c0cfa4012ea2a4983601f7580f9926ca10f79f8dbc8 |