Skip to main content

A lightweight, high-performance Python library for parsing jsonl files.

Project description

orjsonl

orjsonl is a lightweight, high-performance Python library for parsing jsonl files. It supports a wide variety of compression formats, including gzip, bzip2, xz and Zstandard. It is powered by orjson, the fastest and most accurate json serializer for Python.

Installation

orjsonl may be installed with pip:

pip install orjsonl

To read or write Zstandard files, install either zstd or the zstandard Python package.

Usage

The code snippet below demonstrates how jsonl files can be saved, loaded, streamed, appended and extended with orjsonl:

>>> import orjsonl
>>> # Create an iterable of Python objects.
>>> data = [
    'hello world',
    ['fizz', 'buzz'],
]
>>> # Save the iterable to a jsonl file.
>>> orjsonl.save('test.jsonl', data)
>>> # Append a Python object to the jsonl file.
>>> orjsonl.append('test.jsonl', {42 : 3.14})
>>> # Extend the jsonl file with an iterable of Python objects.
>>> orjsonl.extend('test.jsonl', [True, False])
>>> # Load the jsonl file.
>>> orjsonl.load('test.jsonl')
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]
>>> # Stream the jsonl file.
>>> list(orjsonl.stream('test.jsonl'))
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]

orjsonl can also be used to process jsonl files compressed with gzip, bzip2, xz and Zstandard:

>>> orjsonl.save('test.jsonl.gz', data)
>>> orjsonl.append('test.jsonl.gz', {42 : 3.14})
>>> orjsonl.extend('test.jsonl.gz', [True, False])
>>> orjsonl.load('test.jsonl.gz')
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]
>>> [obj for obj in orjsonl.stream('test.jsonl.gz')]
['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]

Load

def load(
    path: str | bytes | int | os.PathLike,
    decompression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> list[dict | list | int | float | str | bool | None]

load() deserializes a compressed or uncompressed UTF-8-encoded jsonl file to a list of Python objects.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized.

decompression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for decompression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

This function returns a list object comprised of deserialized dict, list, int, float, str, bool or None objects.

Stream

def stream(
    path: str | bytes | int | os.PathLike,
    decompression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> Generator[dict | list | int | float | str | bool | None, None, None]

stream() creates a generator that deserializes a compressed or uncompressed UTF-8-encoded jsonl file to Python objects.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized by the generator.

decompression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for decompression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

This function returns a generator that deserializes the file to dict, list, int, float, str, bool or None objects.

Save

def save(
    path: str | bytes | int | os.PathLike,
    data: Iterable,
    default: Optional[Callable] = None,
    option: int = 0,
    compression_level: Optional[int] = None,
    compression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> None

save() serializes an iterable of Python objects to a compressed or uncompressed UTF-8-encoded jsonl file.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be saved.

data is an iterable of Python objects to be serialized to the file.

default is an optional callable passed to orjson.dumps() as the default argument that serializes subclasses or arbitrary types to supported types.

option is an optional integer passed to orjson.dumps() as the option argument that modifies how data is serialized.

compression_level is an optional integer passed to xopen.xopen() as the compresslevel argument that determines the compression level for writing to gzip, xz and Zstandard files.

compression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for compression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

Append

def append(
    path: str | bytes | int | os.PathLike,
    data: Any,
    newline: bool = True,
    default: Optional[Callable] = None,
    option: int = 0,
    compression_level: Optional[int] = None,
    compression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> None

append() serializes and appends a Python object to a compressed or uncompressed UTF-8-encoded jsonl file.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be appended.

data is a Python object to be serialized and appended to the file.

newline is an optional Boolean flag that, if set to False, indicates that the file does not end with a newline and should, therefore, have one added before data is appended.

default is an optional callable passed to orjson.dumps() as the default argument that serializes subclasses or arbitrary types to supported types.

option is an optional integer passed to orjson.dumps() as the option argument that modifies how data is serialized.

compression_level is an optional integer passed to xopen.xopen() as the compresslevel argument that determines the compression level for writing to gzip, xz and Zstandard files.

compression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for compression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

Extend

def extend(
    path: str | bytes | int | os.PathLike,
    data: Iterable,
    newline: bool = True,
    default: Optional[Callable] = None,
    option: int = 0,
    compression_level: Optional[int] = None,
    compression_threads: Optional[int] = None,
    compression_format: Optional[str] = None
) -> None

extend() serializes and appends an iterable of Python objects to a compressed or uncompressed UTF-8-encoded jsonl file.

path is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be extended.

data is an iterable of Python objects to be serialized and appended to the file.

newline is an optional Boolean flag that, if set to False, indicates that the file does not end with a newline and should, therefore, have one added before data is extended.

default is an optional callable passed to orjson.dumps() as the default argument that serializes subclasses or arbitrary types to supported types.

option is an optional integer passed to orjson.dumps() as the option argument that modifies how data is serialized.

compression_level is an optional integer passed to xopen.xopen() as the compresslevel argument that determines the compression level for writing to gzip, xz and Zstandard files.

compression_threads is an optional integer passed to xopen.xopen() as the threads argument that specifies the number of threads that should be used for compression.

compression_format is an optional string passed to xopen.xopen() as the format argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.

License

This library is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orjsonl-1.0.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

orjsonl-1.0.0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file orjsonl-1.0.0.tar.gz.

File metadata

  • Download URL: orjsonl-1.0.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for orjsonl-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5097e7d099a0700a173dbabd90285245442b7940e52386b1470ca1678125e763
MD5 753d0c0616707ece922be2cc1856cca8
BLAKE2b-256 a86eaba71f429cdfc141789afa9ea64d86b4277200a7efd4a3e8a025394178bc

See more details on using hashes here.

File details

Details for the file orjsonl-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: orjsonl-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for orjsonl-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21f5688517a34ae77cd919dac63e11eb103b75da9be60ea910ac4f6862d92a47
MD5 8743049a77cdae0ae8436b69b9d3af01
BLAKE2b-256 029f4c2899436bcc758b7c0cfa4012ea2a4983601f7580f9926ca10f79f8dbc8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page