Skip to main content

A simple Python library for handling jsonlines files.

Project description

jsonl

CI pypi versions codecov license Code style: black Linter: ruff

About

Useful functions for working with jsonlines data as described: https://jsonlines.org/

Features:

  • Exposes an API similar to the json module from the standard library.
  • Supports orjson, ujson libraries or standard json for serialization/deserialization, prioritizing orjson, then ujson, and defaulting to the standard json if none are installed.
  • Supports gzip and bzip2 compression formats.

Installation (via pip)

pip install py-jsonl

Usage

dumps

Serialize an iterable into a jsonlines formatted string.

dumps(iterable, **kwargs)

:param Iterable[Any] iterable: Iterable of objects
:param kwargs: `json.dumps` kwargs
:rtype: str

Examples:

import jsonl

data = ({'foo': 1}, {'bar': 2})
result = jsonl.dumps(data)
print(result)  # >> '{"foo": 1}\n{"bar": 2}\n'
dump

Dump an iterable to a jsonlines file.

  • Use (.gz, .gzip, .bz2) extensions to dump the compressed file.
  • Dumps falls back to the following functions: (orjson.dumps, ujson.dumps, and json.dumps).
dump(iterable, file, **kwargs)

:param Iterable[Any] iterable: Iterable of objects
:param Union[str | bytes | os.PathLike | io.IOBase] file: File to dump
:param kwargs: `json.dumps` kwargs

Examples:

import gzip

import jsonl

data = ({'foo': 1}, {'bar': 2})

# Dump the data into an uncompressed file at the given path.
jsonl.dump(data, "file1.jsonl")

# Dump the data into a gzipped file at the given path.
jsonl.dump(data, "file2.jsonl.gz")

# Dump the data into the already opened gzipped file.
with gzip.open("file3.jsonl.gz", mode="wb") as fp:
    jsonl.dump(data, fp)

# Append the data to the end of the existing gzipped file.
with gzip.open("file3.jsonl.gz", mode="ab") as fp:
    jsonl.dump(data, fp)
dump_fork

Incrementally dumps multiple iterables into the specified jsonlines file paths, effectively reducing memory consumption.

  • Use (.gz, .gzip, .bz2) extensions to dump the compressed file.
  • Dumps falls back to the following functions: (orjson.dumps, ujson.dumps, and json.dumps).
dump_fork(path_iterables, dump_if_empty=True, **kwargs)

:param Iterable[str, Iterable[Any]] path_iterables: Iterable of iterables by filepath
:param bool dump_if_empty: If false, don't create an empty jsonlines file.
:param kwargs: `json.dumps` kwargs

Examples:

import jsonl

path_iterables = (
    ("num.jsonl", ({"value": 1}, {"value": 2})),
    ("foo.jsonl", ({"a": "1"}, {"b": 2})),
    ("num.jsonl", ({"value": 3},)),
    ("foo.jsonl", ()),
)
jsonl.dump_fork(path_iterables)
load

Deserialize a UTF-8-encoded jsonlines file into an iterable of Python objects.

  • Recognizes (.gz, .gzip, .bz2) extensions to load compressed files.
  • Loads falls back to the following functions: (orjson.loads, ujson.loads, and json.loads).
def load(file, **kwargs)

:param Union[str | bytes | os.PathLike | io.IOBase] file: File to load
:param kwargs: `json.loads` kwargs
:rtype: Iterable[Any]

Examples:

import gzip

import jsonl

# Load the uncompressed file from the given path.
iterable1 = jsonl.load("file1.jsonl")
print(tuple(iterable1))

# Load the gzipped file from the given path.
iterable2 = jsonl.load("file2.jsonl.gz")
print(tuple(iterable2))

# Load the gzipped file from the given open file.
with gzip.open("file3.jsonl.gz", mode="rb") as fp:
    iterable3 = jsonl.load(fp)
    print(tuple(iterable3))

Unit tests

(env)$ pip install -r requirements.txt   # Ignore this command if it has already been executed
(env)$ pytest tests/
(env)$ pytest --cov jsonl # Tests with coverge

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_jsonl-1.1.2.tar.gz (6.3 kB view hashes)

Uploaded Source

Built Distribution

py_jsonl-1.1.2-py3-none-any.whl (5.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page