Skip to main content

Jsonline is intend to use to explore and work with json lines files and avoid keep the entire data in memory or constantly read the whole file

Project description

Jsonline

PyPI - License

Jsonline is a Python library for efficiently working with JSON Lines files. It allows you to access and append data without loading the entire file into memory, making it ideal for large datasets.

This library treats JSON Lines files as if they were read-only lists, but with an append method. It builds an index of the start and end positions of each JSON object in the file. When you access an element, Jsonline uses this index to read only the relevant line. The index is stored in a compressed gzip file with a .jsonl.idx extension for efficiency.

Features

  • Memory-efficient: Reads only the data you need from the file.
  • List-like access: Access JSON objects by index (data[i]).
  • Fast appends: Efficiently append single or multiple objects.
  • Automatic indexing: Creates and manages an index file for you.
  • LRU Cache: Caches recently accessed objects in memory for faster retrieval.
  • Context manager support: Works with the with statement for automatic resource management.

Installation

pip install jsonline

Usage

Opening a file

You can open a JSON Lines file using the JsonLine class or the jsonline.open() function. The .jsonl extension is not required in the file path.

from jsonline import JsonLine

# Using the JsonLine class
data = JsonLine('my_file')

# Or using the open function
import jsonline
data = jsonline.open('my_file')

If the file does not exist, it will be created with a .jsonl extension.

Appending data

You can append a single JSON object using the append method, or multiple objects using extend.

# Append a single object
data.append({'test': 1})

# Append multiple objects
data.extend([{'test': 2}, {'another_test': 3}])

Accessing data

You can access individual JSON objects by their index, just like a Python list.

# Get the first object
first_item = data[0]

# Get the last object
last_item = data[-1]

You can also iterate over the entire dataset:

for item in data:
    print(item)

Context Manager

Jsonline supports the context manager protocol, which automatically closes the file for you.

with JsonLine('my_file') as data:
    print(data[0])

API Reference

jsonline.JsonLine(path, cache_size=10, string_keys=True)

The main class for working with JSON Lines files.

  • path (str or pathlib.Path): Path to the JSON Lines file.
  • cache_size (int, optional): The number of items to store in the LRU cache. Defaults to 10.
  • string_keys (bool, optional): If False, allows non-string keys in JSON objects (this is non-standard). Defaults to True.

Methods

  • append(data): Appends a single JSON object to the end of the file.
  • extend(data): Appends an iterable of JSON objects to the end of the file. This is more efficient than calling append in a loop.
  • get(index, default=None): Retrieves an item by its index. If the index is out of bounds, it returns the default value.
  • close(): Closes the file handle.
  • rebuild_index(): Forces a rebuild of the index file. This can be useful if the file has been modified by another process.

jsonline.open(path, cache=10)

A convenience function for creating a JsonLine object.

  • path (str or pathlib.Path): Path to the JSON Lines file.
  • cache (int, optional): The number of items to store in the LRU cache. Defaults to 10.

jsonline.load(f, cache=10)

Loads a JsonLine object from an existing file-like object.

  • f (TextIO): A file-like object opened in text mode. This function uses the .name attribute of the file-like object to open the file, so it will not work with in-memory objects like io.StringIO.
  • cache (int, optional): The number of items to store in the LRU cache. Defaults to 10.

How it Works

Jsonline creates an index file (.jsonl.idx) that stores the byte offset and length of each line in the JSON Lines file. This allows for fast lookups without reading the entire file. When you request an item at a specific index, Jsonline reads the index file to find the position of that item, then seeks to that position in the data file and reads only the necessary bytes.

The index is automatically updated when you use append or extend. If the index file gets out of sync with the data file (e.g., if the data file is modified externally), you can use the rebuild_index() method to regenerate it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonline-0.3.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsonline-0.3.1-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file jsonline-0.3.1.tar.gz.

File metadata

  • Download URL: jsonline-0.3.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for jsonline-0.3.1.tar.gz
Algorithm Hash digest
SHA256 bbce4c333f985d9f83c55587b66860aae22d444fbc28fb82b429330b4d83f971
MD5 226cda23b5c86757ef535dc657d0d5c6
BLAKE2b-256 fbe814539dd1879743a482103b9bf47aa26795db8fea07f70eb227d025bf247a

See more details on using hashes here.

Provenance

The following attestation bundles were made for jsonline-0.3.1.tar.gz:

Publisher: main.yml on fsadannn/jsonline

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jsonline-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: jsonline-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for jsonline-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 91de2b7a9c1f752f26b27cffaebab27541b755a757a4658e280c7e140ccf1fb6
MD5 e42b4308b9e36e8b2e179302bba7d9f3
BLAKE2b-256 cd1c3cdf669f2f2f8373c739dce4ba824b00760ffe73067c4d587947117d631e

See more details on using hashes here.

Provenance

The following attestation bundles were made for jsonline-0.3.1-py3-none-any.whl:

Publisher: main.yml on fsadannn/jsonline

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page