A lightweight relational algebra CLI over JSONL

These details have not been verified by PyPI

Project links

Project description

ja - JSONL Algebra

ja is a lightweight command-line tool and Python library for performing relational algebra operations on JSONL (JSON Lines) data. It's designed to be a simple, dependency-free alternative for common data manipulation tasks, inspired by tools like jq and traditional SQL.

Features

Perform common relational algebra operations: select, project, join, union, intersection, difference, distinct, sort, product, and group by with aggregations.
groupby: A powerful feature that allows you to group data by one or more keys and perform various aggregations on the grouped data.
- By default, includes sum, avg, min, max, count, list (collect all values), first (first value in group), last (last value in group) aggregations.
- Can be extended with custom aggregation functions. See "Extending Group By Aggregations" section.
Works with JSONL files or piped data from stdin/stdout.
Can be used as a CLI tool or as a Python library.
No external dependencies.

Installation

There are two main ways to install ja:

For users (from PyPI):

You can install the package directly from PyPI (Python Package Index) using pip.

pip install jsonl-algebra

For developers (from local repository):**

If you have cloned this repository and want to install it for development or from local sources:

pip install .

To install in editable mode for development:

pip install -e .

CLI Usage

The ja command-line tool provides several subcommands for different operations.

General Syntax: ja <command> [options] [file(s)]

If file is omitted for commands that expect a single input, ja reads from stdin.

Examples

Select rows where 'amount' is greater than 100:

cat data.jsonl | ja select 'amount > 100'

ja select 'amount > 100' data.jsonl

Project 'id' and 'name' columns:
```
cat data.jsonl | ja project id,name
```

Join two files on a common key:

ja join users.jsonl orders.jsonl --on user_id=customer_id

Group by 'category' and count items:

cat products.jsonl | ja groupby category --agg count

Group by 'category', count items, and list all product names:
```
cat products.jsonl | ja groupby category --agg count --agg list:name\
```
This will produce output like: {"category": "electronics", "count": 5, "list_name": ["laptop", "mouse", ...]}

Group by 'user_id' and get the first action:

cat user_actions.jsonl | ja groupby user_id --agg first:action

Sort data by 'timestamp':
```
cat logs.jsonl | ja sort timestamp
```

Infer the schema of a JSONL file:

ja schema data.jsonl

Or from stdin:

cat data.jsonl | ja schema

Start an interactive REPL session:

ja repl

Inside the REPL, you can build a pipeline:

ja> from data.jsonl
ja> select 'age > 30'
ja> project name,email
ja> execute --lines=5 
ja> compile

Export JSONL to a JSON array:

ja export to-array data.jsonl > data.json

Convert a JSON array back to JSONL:

ja export to-jsonl data.json > data.jsonl

Explode a JSONL file into a directory of individual JSON files:
```
ja export explode data.jsonl -o data_exploded
```
This will create a directory data_exploded with files like item-0.json, item-1.json, etc.

Implode a directory of JSON files back into a JSONL stream:

ja export implode data_exploded --add-filename-key source_file > combined.jsonl

Available Commands

select: Filter rows based on a Python expression.
project: Select specific columns.
join: Join two relations on specified keys.
rename: Rename columns.
union: Combine two relations (all rows).
difference: Rows in the first relation but not the second.
distinct: Remove duplicate rows.
intersection: Rows common to both relations.
sort (maps to sort_by): Sort a relation by specified keys.
product: Cartesian product of two relations.
groupby (maps to groupby_agg): Group rows by a key and perform aggregations.
schema: Infer and display the schema of a JSONL file.
repl: Start an interactive REPL session to build command pipelines.
export: A group of commands for transforming data formats.
- to-array: Convert JSONL to a single JSON array.
- to-jsonl: Convert a JSON array (from a file or stdin) to JSONL.
- explode: Export each line of a JSONL file to a separate JSON file in a directory.
- implode: Combine JSON files from a directory into a JSONL stream.

Use ja <command> --help or ja export <subcommand> --help for more details on specific commands.

Programmatic API Usage

You can also use ja as a Python library:

import ja
from ja import Row, Relation # For type hinting if needed

# Load data from JSONL files
# users_data = ja.read_jsonl("users.jsonl")
# orders_data = ja.read_jsonl("orders.jsonl")

# Example data (replace with ja.read_jsonl for actual files)
users_data: Relation = [
    {"user_id": 1, "name": "Alice", "status": "active", "email": "alice@example.com"},
    {"user_id": 2, "name": "Bob", "status": "inactive", "email": "bob@example.com"},
    {"user_id": 1, "name": "Alice", "status": "active", "email": "alice@example.com"} # Duplicate for distinct example
]
orders_data: Relation = [
    {"order_id": 101, "customer_id": 1, "item": "Book", "quantity": 1},
    {"order_id": 102, "customer_id": 2, "item": "Pen", "quantity": 5},
    {"order_id": 103, "customer_id": 1, "item": "Notebook", "quantity": 2},
    {"order_id": 104, "customer_id": 1, "item": "Book", "quantity": 3}
]


# Example: Select active users
active_users = ja.select(users_data, lambda row: row.get("status") == "active")
# active_users will be:
# [{'user_id': 1, 'name': 'Alice', 'status': 'active', 'email': 'alice@example.com'},
#  {'user_id': 1, 'name': 'Alice', 'status': 'active', 'email': 'alice@example.com'}]

# Example: Project name and email from distinct active users
distinct_active_users = ja.distinct(active_users)
user_info = ja.project(distinct_active_users, ["name", "email"])
# user_info will be:
# [{'name': 'Alice', 'email': 'alice@example.com'}]

# Example: Join distinct active users with their orders
# Ensure the join key 'user_id' is present in the projected active_users relation
active_users_with_id = ja.project(distinct_active_users, ["user_id", "name", "email"])
joined_data = ja.join(active_users_with_id, orders_data, on=[("user_id", "customer_id")])
# joined_data will be:
# [{'user_id': 1, 'name': 'Alice', 'email': 'alice@example.com', 'order_id': 101, 'item': 'Book', 'quantity': 1},
#  {'user_id': 1, 'name': 'Alice', 'email': 'alice@example.com', 'order_id': 103, 'item': 'Notebook', 'quantity': 2},
#  {'user_id': 1, 'name': 'Alice', 'email': 'alice@example.com', 'order_id': 104, 'item': 'Book', 'quantity': 3}]

# Example: Group joined data by user and sum quantities, list items
grouped_orders = ja.groupby_agg(
    joined_data,
    group_by_key="user_id",
    aggregations=[
        ("sum", "quantity"),
        ("list", "item"),
        ("count", "") # Count groups
    ]
)
# grouped_orders might be (depending on Alice's orders):
# [{'user_id': 1, 'sum_quantity': 6, 'list_item': ['Book', 'Notebook', 'Book'], 'count': 3}]


# Print results (example)
for row in grouped_orders:
    print(row)

# Available functions mirror the CLI commands:
# ja.select, ja.project, ja.join, ja.rename, ja.union,
# ja.difference, ja.distinct, ja.intersection, ja.sort_by,
# ja.product, ja.groupby_agg

Extending Group By Aggregations

By default, ja supports several built-in aggregation functions for the groupby_agg operation. These include: count, sum, avg, min, max, list, first, and last. Syntax for aggregations: agg_name (for count) or agg_name:column_name (e.g., sum:price, list:product_id).

The groupby_agg functionality is designed to be extensible. The core logic resides in the ja.groupby module, which uses a dispatcher pattern.

Define an aggregation helper function

This function will take the collected data for a group (typically a list of values, or a single value for aggregations like first/last) and return the aggregated result.

For instance, if you want to add a custom aggregation function for calculating the median, you would define a function that takes a list of values and returns the median.

def _my_custom_median_agg(collected_values: list) -> float | None:
    numeric_vals = sorted([v for v in collected_values if isinstance(v, (int, float))])
    if not numeric_vals:
        return None
    n = len(numeric_vals)
    mid = n // 2
    if n % 2 == 0:
        return (numeric_vals[mid - 1] + numeric_vals[mid]) / 2
    else:
        return numeric_vals[mid]

Now, register it (if modifying ja directly or for illustration):

# In ja/groupby.py
AGGREGATION_DISPATCHER = {
    # ...
    "median": _my_custom_median_agg,
}

For programmatic use with your own ja instance or a forked version, you could potentially expose a way to register custom aggregators or pass them directly if the API supported it.

If your aggregation requires a specific way of collecting data during the first pass of groupby_agg (different from how list, first, or last collect data), you would need to modify the data collection logic in ja.groupby.groupby_agg.

This structure allows for significant flexibility. For instance, one could implement a general reduce aggregation that takes Python expressions for an initial value and a step function, operating on the list of values collected for a group.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.1

Mar 4, 2026

1.4

Jan 31, 2026

1.3

Dec 15, 2025

1.2

Nov 14, 2025

1.1

Aug 19, 2025

1.0

Jun 25, 2025

This version

0.81

Jun 14, 2025

0.9

Jun 25, 2025

0.8

May 24, 2025

0.7.2

May 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonl_algebra-0.81.tar.gz (28.2 kB view details)

Uploaded Jun 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jsonl_algebra-0.81-py3-none-any.whl (23.9 kB view details)

Uploaded Jun 14, 2025 Python 3

File details

Details for the file jsonl_algebra-0.81.tar.gz.

File metadata

Download URL: jsonl_algebra-0.81.tar.gz
Upload date: Jun 14, 2025
Size: 28.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for jsonl_algebra-0.81.tar.gz
Algorithm	Hash digest
SHA256	`e6cb460a3ce981661c625eef29ecb729e4302888041074a534295396e5d0f2af`
MD5	`ba2f01e2d51a7ad2591f5e82a47d00d4`
BLAKE2b-256	`b46fbdf4d1adf6b597849a7392266994dc16d5780c694844b08a202476115540`

See more details on using hashes here.

File details

Details for the file jsonl_algebra-0.81-py3-none-any.whl.

File metadata

Download URL: jsonl_algebra-0.81-py3-none-any.whl
Upload date: Jun 14, 2025
Size: 23.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for jsonl_algebra-0.81-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a9cc5e5706c6697f841c475574629bfb16410014bd4c6c1d66be9ed156359f4`
MD5	`36829df97a82271f40a0536295831a86`
BLAKE2b-256	`d1f333b922ed7192fe14a68b55fe1d4b424d028d622d1b3b56eb538c3be72a7a`

See more details on using hashes here.

jsonl-algebra 0.81

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

ja - JSONL Algebra

Features

Installation

For users (from PyPI):

For developers (from local repository):**

CLI Usage

Examples

Available Commands

Programmatic API Usage

Extending Group By Aggregations

Define an aggregation helper function

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes