Jsonline is intend to use to explore and work with json lines files and avoid keep the entire data in memory or constantly read the whole file
Project description
Jsonline
Jsonline is a Python library for efficiently working with JSON Lines files. It allows you to access and append data without loading the entire file into memory, making it ideal for large datasets.
This library treats JSON Lines files as if they were read-only lists, but with an append method. It builds an index of the start and end positions of each JSON object in the file. When you access an element, Jsonline uses this index to read only the relevant line. The index is stored in a compressed gzip file with a .jsonl.idx extension for efficiency.
Features
- Memory-efficient: Reads only the data you need from the file.
- List-like access: Access JSON objects by index (
data[i]). - Fast appends: Efficiently append single or multiple objects.
- Automatic indexing: Creates and manages an index file for you.
- LRU Cache: Caches recently accessed objects in memory for faster retrieval.
- Context manager support: Works with the
withstatement for automatic resource management.
Installation
pip install jsonline
Usage
Opening a file
You can open a JSON Lines file using the JsonLine class or the jsonline.open() function. The .jsonl extension is not required in the file path.
from jsonline import JsonLine
# Using the JsonLine class
data = JsonLine('my_file')
# Or using the open function
import jsonline
data = jsonline.open('my_file')
If the file does not exist, it will be created with a .jsonl extension.
Appending data
You can append a single JSON object using the append method, or multiple objects using extend.
# Append a single object
data.append({'test': 1})
# Append multiple objects
data.extend([{'test': 2}, {'another_test': 3}])
Accessing data
You can access individual JSON objects by their index, just like a Python list.
# Get the first object
first_item = data[0]
# Get the last object
last_item = data[-1]
You can also iterate over the entire dataset:
for item in data:
print(item)
Context Manager
Jsonline supports the context manager protocol, which automatically closes the file for you.
with JsonLine('my_file') as data:
print(data[0])
API Reference
jsonline.JsonLine(path, cache_size=10, string_keys=True)
The main class for working with JSON Lines files.
path(str or pathlib.Path): Path to the JSON Lines file.cache_size(int, optional): The number of items to store in the LRU cache. Defaults to10.string_keys(bool, optional): IfFalse, allows non-string keys in JSON objects (this is non-standard). Defaults toTrue.
Methods
append(data): Appends a single JSON object to the end of the file.extend(data): Appends an iterable of JSON objects to the end of the file. This is more efficient than callingappendin a loop.get(index, default=None): Retrieves an item by its index. If the index is out of bounds, it returns thedefaultvalue.close(): Closes the file handle.rebuild_index(): Forces a rebuild of the index file. This can be useful if the file has been modified by another process.
jsonline.open(path, cache=10)
A convenience function for creating a JsonLine object.
path(str or pathlib.Path): Path to the JSON Lines file.cache(int, optional): The number of items to store in the LRU cache. Defaults to10.
jsonline.load(f, cache=10)
Loads a JsonLine object from an existing file-like object.
f(TextIO): A file-like object opened in text mode. This function uses the.nameattribute of the file-like object to open the file, so it will not work with in-memory objects likeio.StringIO.cache(int, optional): The number of items to store in the LRU cache. Defaults to10.
How it Works
Jsonline creates an index file (.jsonl.idx) that stores the byte offset and length of each line in the JSON Lines file. This allows for fast lookups without reading the entire file. When you request an item at a specific index, Jsonline reads the index file to find the position of that item, then seeks to that position in the data file and reads only the necessary bytes.
The index is automatically updated when you use append or extend. If the index file gets out of sync with the data file (e.g., if the data file is modified externally), you can use the rebuild_index() method to regenerate it.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jsonline-0.3.1.tar.gz.
File metadata
- Download URL: jsonline-0.3.1.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbce4c333f985d9f83c55587b66860aae22d444fbc28fb82b429330b4d83f971
|
|
| MD5 |
226cda23b5c86757ef535dc657d0d5c6
|
|
| BLAKE2b-256 |
fbe814539dd1879743a482103b9bf47aa26795db8fea07f70eb227d025bf247a
|
Provenance
The following attestation bundles were made for jsonline-0.3.1.tar.gz:
Publisher:
main.yml on fsadannn/jsonline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jsonline-0.3.1.tar.gz -
Subject digest:
bbce4c333f985d9f83c55587b66860aae22d444fbc28fb82b429330b4d83f971 - Sigstore transparency entry: 381521529
- Sigstore integration time:
-
Permalink:
fsadannn/jsonline@78265125e5f486d18bf2f6e546127e198ff05627 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/fsadannn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
main.yml@78265125e5f486d18bf2f6e546127e198ff05627 -
Trigger Event:
push
-
Statement type:
File details
Details for the file jsonline-0.3.1-py3-none-any.whl.
File metadata
- Download URL: jsonline-0.3.1-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91de2b7a9c1f752f26b27cffaebab27541b755a757a4658e280c7e140ccf1fb6
|
|
| MD5 |
e42b4308b9e36e8b2e179302bba7d9f3
|
|
| BLAKE2b-256 |
cd1c3cdf669f2f2f8373c739dce4ba824b00760ffe73067c4d587947117d631e
|
Provenance
The following attestation bundles were made for jsonline-0.3.1-py3-none-any.whl:
Publisher:
main.yml on fsadannn/jsonline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jsonline-0.3.1-py3-none-any.whl -
Subject digest:
91de2b7a9c1f752f26b27cffaebab27541b755a757a4658e280c7e140ccf1fb6 - Sigstore transparency entry: 381521547
- Sigstore integration time:
-
Permalink:
fsadannn/jsonline@78265125e5f486d18bf2f6e546127e198ff05627 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/fsadannn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
main.yml@78265125e5f486d18bf2f6e546127e198ff05627 -
Trigger Event:
push
-
Statement type: