Skip to main content

A simple Python library for handling jsonlines files.

Project description

jsonl

CI pypi versions codecov license Code style: black Linter: ruff

About

Useful functions for working with jsonlines data as described: https://jsonlines.org/

Features:

  • 🌎 Offers an API similar to Python's built-in json module.
  • 🚀 Supports serialization/deserialization using the most common json libraries, prioritizing orjson, then ujson, and defaulting to the standard json if the others are unavailable.
  • 🗜️ Enables compression using gzip, bzip2, and xz formats.
  • 🔧 Load files containing broken lines, skipping any malformed lines.
  • 📦 Provides a simple API for incremental writing to multiple files.

Installation (via pip)

pip install py-jsonl

Usage

Serialize an iterable into a JSON Lines formatted string. (dumps)

Examples:

import jsonl

data = ({'foo': 1}, {'bar': 2})
result = jsonl.dumps(data)
print(result)
Dump an iterable to a JSON Lines file. (dump)

Examples:

Write the data to an uncompressed file at the specified path.

import jsonl

data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

jsonl.dump(data, "file.jsonl")  # as list
jsonl.dump(iter(data), "file.jsonl")  # as iterable

Write the data to a compressed file at the specified path.

import jsonl

data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

jsonl.dump(data, "file.jsonl.gz")  # gzip compression
jsonl.dump(data, "file.jsonl.bz2")  # bzip2 compression
jsonl.dump(data, "file.jsonl.xz")  # xz compression

Write the data to the already opened gzipped file.

import gzip
import jsonl

data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

with gzip.open("file.jsonl.gz", mode="wb") as fp:
    jsonl.dump(data, fp, text_mode=False)

Append the data to the end of the existing gzipped file.

import gzip
import jsonl

data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

with gzip.open("file.jsonl.gz", mode="ab") as fp:
    jsonl.dump(data, fp, text_mode=False)

Write the data to a custom file object.

import jsonl

class MyCustomFile1:

  def write(self, line):
      print(line)

class MyCustomFile2:

  def writelines(self, lines):
      print("".join(lines))

data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

jsonl.dump(data, MyCustomFile1(), text_mode=True)
jsonl.dump(data, MyCustomFile2(), text_mode=True)
Dump fork (Incremental dump)

Incrementally dumps multiple iterables into the specified jsonlines file paths, effectively reducing memory consumption.

Examples:

import jsonl


def worker():
    yield ("num.jsonl", ({"value": 1}, {"value": 2}))  # as tuple
    yield ("foo.jsonl", iter(({"a": "1"}, {"b": 2})))  # as iterator
    yield ("num.jsonl", ({"value": 3},))
    yield ("foo.jsonl", ())


jsonl.dump_fork(worker())
load

Deserialize a UTF-8 encoded jsonlines file into an iterable of Python objects.

Examples:

Load an uncompressed file from the specified path.

import jsonl

path = "file.jsonl"
data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

jsonl.dump(data, path)
iterable = jsonl.load(path)
print(tuple(iterable))

Load a compressed file from the specified path.

import jsonl

path = "file.jsonl.gz"
data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

jsonl.dump(data, path)
iterable = jsonl.load(path)
print(tuple(iterable))

Load a compressed file from the specified open file object.

import gzip
import jsonl

path = "file.jsonl.gz"
data = [
    {"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
    {"name": "May", "wins": []},
]

jsonl.dump(data, path)
with gzip.open(path, mode="rb") as fp:
    iterable = jsonl.load(fp)
    print(tuple(iterable))

Load a file containing broken lines, skipping any malformed lines.

import jsonl

with open("file.jsonl", mode="wt", encoding="utf-8") as fp:
    fp.write('{"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]}\n')
    fp.write('{"name": "May", "wins": []\n')  # missing closing bracket
    fp.write('{"name": "Richard", "wins": []}\n')

iterable = jsonl.load("file.jsonl", broken=True)
print(tuple(iterable))

Unit tests

(env)$ pip install -r requirements.txt   # Ignore this command if it has already been executed
(env)$ pytest tests/
(env)$ pytest --cov jsonl # Tests with coverge

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_jsonl-1.3.4.tar.gz (7.2 kB view hashes)

Uploaded Source

Built Distribution

py_jsonl-1.3.4-py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page