A lightweight relational algebra CLI over JSONL

These details have not been verified by PyPI

Project links

Project description

ja - JSONL Algebra

ja is a powerful command-line tool and Python library for performing relational algebra operations on JSONL (JSON Lines) data. It's designed to be a robust, feature-rich alternative for data manipulation tasks, inspired by tools like jq and traditional SQL.

Features

Relational Operations: Perform common relational algebra operations: select, project, join, union, intersection, difference, distinct, sort, product, and group by with aggregations.
Advanced Filtering: Uses JMESPath expressions for safe and expressive row filtering (instead of eval).
Schema Management:
- schema infer: Automatically infer JSON Schema from JSONL data
- schema validate: Validate JSONL data against JSON Schema files
Interactive REPL: Build and test data pipelines interactively with auto-completion and pipeline compilation.
Import/Export: Comprehensive data format conversion capabilities:
- CSV import/export with type inference and flattening
- JSON array conversion
- Directory explode/implode for individual JSON files
- Custom column transformations for CSV export
Aggregations: Powerful groupby feature with built-in aggregations:
- sum, avg, min, max, count, list, first, last
- Extensible aggregation system for custom functions
Fully Pipeable: Consistent support for stdin/stdout with - notation for maximum composability.
Type-Safe: Optional type inference for CSV imports and robust schema validation.

Installation

Dependencies

ja now includes optional dependencies for enhanced functionality:

jmespath: For safe and expressive filtering (replaces eval)
jsonschema: For schema validation features
All other features work without external dependencies

For users (from PyPI)

You can install the package directly from PyPI (Python Package Index) using pip.

pip install jsonl-algebra

This will automatically install the required dependencies (jmespath and jsonschema).

For developers (from local repository)

If you have cloned this repository and want to install it for development or from local sources:

pip install .

To install in editable mode for development:

pip install -e .

CLI Usage

The ja command-line tool provides several subcommands for different operations.

General Syntax: ja <command> [options] [file(s)]

If file is omitted for commands that expect a single input, ja reads from stdin.

Examples

Select rows where 'amount' is greater than 100 (using JMESPath):

cat data.jsonl | ja select 'amount > `100`'
ja select 'amount > `100`' data.jsonl

Project 'id' and 'name' columns:

cat data.jsonl | ja project id,name

Join two files on a common key:

ja join users.jsonl orders.jsonl --on user_id=customer_id

Group by 'category' and count items:

cat products.jsonl | ja groupby category --agg count

Sort data by 'timestamp' in descending order:

cat logs.jsonl | ja sort timestamp --desc

Schema Operations:

# Infer schema from JSONL data
ja schema infer data.jsonl

# Validate JSONL data against a schema
ja schema validate schema.json data.jsonl

# Pipeline: infer schema from filtered data, then validate original file
ja select 'active == `true`' users.jsonl | ja schema infer | ja schema validate - users.jsonl

Interactive REPL session:

ja repl

Inside the REPL:

ja> from data.jsonl
ja> select 'age > `30`'
ja> project name,email
ja> execute --lines=5 
ja> compile

Export/Import Operations:

# Convert JSONL to JSON array
ja export array data.jsonl > data.json

# Convert JSON array back to JSONL
ja export jsonl data.json > data.jsonl

# Explode JSONL into individual JSON files
ja export explode data.jsonl -o data_exploded

# Implode directory back to JSONL
ja import implode data_exploded --add-filename-key source_file > combined.jsonl

# Export to CSV with flattening and custom transformations
ja export csv data.jsonl --apply timestamp "lambda t: t.split('T')[0]" > data.csv

# Import CSV with type inference
ja import csv data.csv --infer-types > data.jsonl

Available Commands

Core Operations:

select: Filter rows using JMESPath expressions (safe alternative to eval)
project: Select specific columns
join: Join two relations on specified keys
rename: Rename columns
union: Combine two relations (all rows)
difference: Rows in the first relation but not the second
distinct: Remove duplicate rows
intersection: Rows common to both relations
sort: Sort a relation by specified keys (supports --desc for descending order)
product: Cartesian product of two relations
groupby: Group rows by a key and perform aggregations

Schema Operations:

schema infer: Infer and display JSON Schema from JSONL data
schema validate: Validate JSONL data against a JSON Schema file (supports piping schemas)

Interactive Tools:

repl: Interactive REPL for building and testing data pipelines

Export Operations:

export array: Convert JSONL to a single JSON array
export jsonl: Convert a JSON array to JSONL
export explode: Export each JSONL line to a separate JSON file in a directory
export csv: Convert JSONL to CSV with automatic flattening and custom transformations

Import Operations:

import csv: Convert CSV to JSONL with optional type inference
import implode: Combine JSON files from a directory into JSONL

Pipeline-Friendly Design: All commands support stdin/stdout and the - notation for maximum composability in shell pipelines.

Use ja <command> --help or ja export <subcommand> --help for more details on specific commands.

Programmatic API Usage

You can also use ja as a Python library:

import ja
import jmespath
from ja import Row, Relation # For type hinting if needed

# Load data from JSONL files
# users_data = ja.read_jsonl("users.jsonl")
# orders_data = ja.read_jsonl("orders.jsonl")

# Example data (replace with ja.read_jsonl for actual files)
users_data: Relation = [
    {"user_id": 1, "name": "Alice", "status": "active", "email": "alice@example.com"},
    {"user_id": 2, "name": "Bob", "status": "inactive", "email": "bob@example.com"},
    {"user_id": 1, "name": "Alice", "status": "active", "email": "alice@example.com"} # Duplicate for distinct example
]
orders_data: Relation = [
    {"order_id": 101, "customer_id": 1, "item": "Book", "quantity": 1},
    {"order_id": 102, "customer_id": 2, "item": "Pen", "quantity": 5},
    {"order_id": 103, "customer_id": 1, "item": "Notebook", "quantity": 2},
    {"order_id": 104, "customer_id": 1, "item": "Book", "quantity": 3}
]

# Example: Select active users using JMESPath
expression = jmespath.compile("[?status == 'active']")
active_users = ja.select(users_data, expression)
# active_users will be:
# [{'user_id': 1, 'name': 'Alice', 'status': 'active', 'email': 'alice@example.com'},
#  {'user_id': 1, 'name': 'Alice', 'status': 'active', 'email': 'alice@example.com'}]

# Example: Project name and email from distinct active users
distinct_active_users = ja.distinct(active_users)
user_info = ja.project(distinct_active_users, ["name", "email"])
# user_info will be:
# [{'name': 'Alice', 'email': 'alice@example.com'}]

# Example: Schema inference
from ja.schema import infer_schema
schema = infer_schema(users_data)
# Returns a valid JSON Schema with inferred types and required fields

# Example: Join distinct active users with their orders
active_users_with_id = ja.project(distinct_active_users, ["user_id", "name", "email"])
joined_data = ja.join(active_users_with_id, orders_data, on=[("user_id", "customer_id")])

# Example: Group joined data by user and sum quantities, list items
grouped_orders = ja.groupby_agg(
    joined_data,
    group_by_key="user_id",
    aggregations=[
        ("sum", "quantity"),
        ("list", "item"),
        ("count", "") # Count groups
    ]
)

# Available functions mirror the CLI commands:
# ja.select, ja.project, ja.join, ja.rename, ja.union,
# ja.difference, ja.distinct, ja.intersection, ja.sort_by,
# ja.product, ja.groupby_agg

Extending Group By Aggregations

By default, ja supports several built-in aggregation functions for the groupby_agg operation. These include: count, sum, avg, min, max, list, first, and last. Syntax for aggregations: agg_name (for count) or agg_name:column_name (e.g., sum:price, list:product_id).

The groupby_agg functionality is designed to be extensible. The core logic resides in the ja.groupby module, which uses a dispatcher pattern.

Define an aggregation helper function

This function will take the collected data for a group (typically a list of values, or a single value for aggregations like first/last) and return the aggregated result.

For instance, if you want to add a custom aggregation function for calculating the median, you would define a function that takes a list of values and returns the median.

def _my_custom_median_agg(collected_values: list) -> float | None:
    numeric_vals = sorted([v for v in collected_values if isinstance(v, (int, float))])
    if not numeric_vals:
        return None
    n = len(numeric_vals)
    mid = n // 2
    if n % 2 == 0:
        return (numeric_vals[mid - 1] + numeric_vals[mid]) / 2
    else:
        return numeric_vals[mid]

Now, register it (if modifying ja directly or for illustration):

# In ja/groupby.py
AGGREGATION_DISPATCHER = {
    # ...
    "median": _my_custom_median_agg,
}

For programmatic use with your own ja instance or a forked version, you could potentially expose a way to register custom aggregators or pass them directly if the API supported it.

If your aggregation requires a specific way of collecting data during the first pass of groupby_agg (different from how list, first, or last collect data), you would need to modify the data collection logic in ja.groupby.groupby_agg.

This structure allows for significant flexibility. For instance, one could implement a general reduce aggregation that takes Python expressions for an initial value and a step function, operating on the list of values collected for a group.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.1

Mar 4, 2026

1.4

Jan 31, 2026

1.3

Dec 15, 2025

1.2

Nov 14, 2025

1.1

Aug 19, 2025

1.0

Jun 25, 2025

0.81

Jun 14, 2025

This version

0.9

Jun 25, 2025

0.8

May 24, 2025

0.7.2

May 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonl_algebra-0.9.tar.gz (35.6 kB view details)

Uploaded Jun 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jsonl_algebra-0.9-py3-none-any.whl (28.4 kB view details)

Uploaded Jun 25, 2025 Python 3

File details

Details for the file jsonl_algebra-0.9.tar.gz.

File metadata

Download URL: jsonl_algebra-0.9.tar.gz
Upload date: Jun 25, 2025
Size: 35.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for jsonl_algebra-0.9.tar.gz
Algorithm	Hash digest
SHA256	`af6e7cf9ccec93fe80dd1d365ed714d15e96a7bba749b0d00bc81381fe4a594f`
MD5	`9e370c437f51b40e4ac0c05b5ef3d7e1`
BLAKE2b-256	`e7cf5a79466999c4eb3f50589a199c22967159d7b6997a145f4f43538d808ed9`

See more details on using hashes here.

File details

Details for the file jsonl_algebra-0.9-py3-none-any.whl.

File metadata

Download URL: jsonl_algebra-0.9-py3-none-any.whl
Upload date: Jun 25, 2025
Size: 28.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for jsonl_algebra-0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`11f1469e434ba6de0379f398f81c814072747c060ad625deaa0cd5ef579608e0`
MD5	`b7b15fc5de8f3d715ef37d8b877e68d4`
BLAKE2b-256	`7aba6c082a229e3b51e97013f03f949d26e8ec554e0c85b9a210562c347a0e83`

See more details on using hashes here.

jsonl-algebra 0.9

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

ja - JSONL Algebra

Features

Installation

Dependencies

For users (from PyPI)

For developers (from local repository)

CLI Usage

Examples

Available Commands

Programmatic API Usage

Extending Group By Aggregations

Define an aggregation helper function

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes