Skip to main content

A simple, fast and lightweight Python library for loading, saving, streaming and appending both compressed and uncompressed jsonl (also known as ‘json lines’, ‘newline-delimited json’ and ‘ndjson’) files.

Project description

orjsonl

orjsonl is a simple, fast and lightweight Python library for loading, saving, streaming and appending both compressed and uncompressed jsonl (also known as ‘json lines’, ‘newline-delimited json’ and ‘ndjson’) files. It is powered by orjson, the quickest and most correct json serializer currently available for Python.

Installation

orjsonl may be installed with pip:

pip install orjsonl

To read or write Zstandard files, either zstd or the zstandard Python package must be installed.

Usage

This code snippet demonstrates how jsonl files can be loaded, saved, appended and streamed with the load(), save(), append() and stream() functions, respectively:

>>> import orjsonl
>>> data = [
    {'hello' : 'world'},
    [1.1, 2.2, 3.3],
    42,
    True,
    None
]
>>> orjsonl.save('test.jsonl', data)
>>> orjsonl.load('test.jsonl')
[{'hello': 'world'}, [1.1, 2.2, 3.3], 42, True, None]
>>> orjsonl.append('test.jsonl', [('a', 'b', 'c')])
>>> [object_ for object_ in orjsonl.stream('test.jsonl')]
[{'hello': 'world'}, [1.1, 2.2, 3.3], 42, True, None, ['a', 'b', 'c']]

The exact same functions can also be used to process jsonl files compressed with gzip, bzip2, xz and zstandard:

>>> import orjsonl
>>> data = [
    {'hello' : 'world'},
    [1.1, 2.2, 3.3],
    42,
    True,
    None
]
>>> orjsonl.save('test.jsonl.gz', data)
>>> orjsonl.load('test.jsonl.gz')
[{'hello': 'world'}, [1.1, 2.2, 3.3], 42, True, None]
>>> orjsonl.append('test.jsonl.gz', [('a', 'b', 'c')])
>>> [object_ for object_ in orjsonl.stream('test.jsonl.gz')]
[{'hello': 'world'}, [1.1, 2.2, 3.3], 42, True, None, ['a', 'b', 'c']]

Load

def load(
    path: str | bytes | int | os.PathLike,
    decompression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> list[dict | list | int | float | str | bool | None]: ...

load() deserializes a compressed or uncompressed UTF-8-encoded jsonl file to a list of Python objects.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized.

decompression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for decompression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

This function returns a list object comprised of deserialized dict, list, int, float, str, bool or None objects.

Stream

def stream(
    path: str | bytes | int | os.PathLike,
    decompression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> Generator[dict | list | int | float | str | bool | None, None, None]: ...

stream() creates a generator that deserializes a compressed or uncompressed UTF-8-encoded jsonl file to Python objects.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized by the generator.

decompression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for decompression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

This function returns a generator that deserializes the file to dict, list, int, float, str, bool or None objects.

Save

def save(
    path: str | bytes | int | os.PathLike,
    data: Iterable,
    default: Optional[Callable] = None,
    option: int = 0,
    compression_level: Optional[int] = None,
    compression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> None: ...

save() serializes an iterable of Python objects to a compressed or uncompressed UTF-8-encoded jsonl file.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be saved.

data is an iterable of Python objects to be serialized to the file.

default is an optional callable passed to orjson.dumps() as the default argument that serializes subclasses or arbitrary types to a supported type.

option is an optional integer passed to orjson.dumps() as the option argument that modifies how data is serialized.

compression_level is an optional integer passed to xopen.xopen() as the compresslevel argument that determines the compression level for writing to gzip, xz and zstandard files.

decompression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for compression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

Append

def append(
    path: str | bytes | int | os.PathLike,
    data: Iterable,
    newline: bool = True,
    default: Optional[Callable] = None,
    option: int = 0,
    compression_level: Optional[int] = None,
    compression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> None: ...

append() serializes and appends an iterable of Python objects to a UTF-8-encoded jsonl file.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be appended.

data is an iterable of Python objects to be serialized and appended to the file.

newline is an optional Boolean flag that, if set to False, indicates that the file does not end with a newline and should, therefore, have one added before data is appended.

default is an optional callable passed to orjson.dumps() as the default argument that serializes subclasses or arbitrary types to a supported type.

option is an optional integer passed to orjson.dumps() as the option argument that modifies how data is serialized.

compression_level is an optional integer passed to xopen.xopen() as the compresslevel argument that determines the compression level for writing to gzip, xz and zstandard files.

decompression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for compression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

License

This library is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orjsonl-0.2.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orjsonl-0.2.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file orjsonl-0.2.0.tar.gz.

File metadata

  • Download URL: orjsonl-0.2.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for orjsonl-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bcc6b895f6fe79bd820202f7ad75d5bc090b47abddd39331aa508bd58817591d
MD5 67ed3d26c7ad517af378b4673a750647
BLAKE2b-256 2d1128b5cc5e0d453eefa4e919684a532af23c51f8e061fe02a0a55d25e25b65

See more details on using hashes here.

File details

Details for the file orjsonl-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: orjsonl-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for orjsonl-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7a4a9a536b6e87776893af1f11c8700f338169e766900c5f7027982943afd9c
MD5 de61e47fad75c1175269e14c03cfb670
BLAKE2b-256 e292ee6c6ae9b382d655a7cfd92e868e8b8c597bf9315f4f0d17d1c0ae572875

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page