CLI tool for doing data joining
Project description
Cobble
A CLI tool for ad-hoc data joining, filtering, and analysis. Pipe in CSV, JSON, or plain text and query it with a shell-friendly syntax.
Install
pip install cobblequery
Quick Start
# Pipe CSV and filter
cat users.csv | cobble '| search role=admin | select name,email'
# Read a file directly (skip stdin)
cobble -s '| from data.csv | head 5'
# Chain JSON lines from another tool
kubectl get pods -o json | cobble '| select name,status'
CLI Usage
usage: cobble [-h] [--dry-run] [-i] [-s] [-q QUERY_FILE] [query]
positional arguments:
query Query to run
optional arguments:
-h, --help show this help message and exit
--dry-run Parse and validate query without running
-i, --interactive Do an interactive edit of the query
-s, --no-stdin Dont insert the stdin generator
-q QUERY_FILE, --query-file QUERY_FILE
Read query text from a file
Queries are pipe-delimited chains of commands:
| command1 args | command2 args | ...
Examples
Filtering rows
# Exact match
cat servers.csv | cobble '| search env=prod'
# Numeric comparison
cat orders.csv | cobble '| search total>100'
# Multiple conditions (AND)
cat orders.csv | cobble '| search status=shipped,total>=50'
# Regex match
cat logs.csv | cobble '| search path~=/api/v[23]'
# Not equal
cat users.csv | cobble '| search role!=guest'
# Python expression for complex logic
cat data.csv | cobble '| py "age > 21 and status == \"active\""'
Selecting and renaming fields
# Pick specific fields
cat users.csv | cobble '| select name,email,role'
# Rename a field
cat data.csv | cobble '| rename new_name=old_name'
Sorting
# Sort ascending
cat scores.csv | cobble '| sort name'
# Sort descending
cat scores.csv | cobble '| sort -score'
# Multi-field sort
cat employees.csv | cobble '| sort department,-salary'
Limiting results
# First 10 rows
cat huge.csv | cobble '| head 10'
# Rows 5 through 15
cat data.csv | cobble '| slice 5,15'
Aggregation
# Count and sum with auto-named output fields (field_operation)
cat sales.csv | cobble '| agg by=region sum(revenue) count(id)'
# Output: {"region": "west", "revenue_sum": 48000, "id_count": 12}
# Named output fields
cat sales.csv | cobble '| agg by=region, total=sum(revenue), n=count(id)'
# Output: {"region": "west", "total": 48000, "n": 12}
# No group-by (aggregate everything)
cat sales.csv | cobble '| agg total=sum(revenue)'
# Multiple group-by fields
cat data.csv | cobble '| agg by=year,quarter avg(revenue) min(cost) max(cost)'
Available aggregation functions: sum, count, avg, min, max, first, last, dc (distinct count), values, unique_values.
Joining datasets
# Join users with their departments (left join, first match)
cobble -s '| from users.csv | join dept_id [ | from departments.csv ] | select name,dept_name'
# Join with different key names (source_key:target_key)
cobble -s '| from orders.csv | join customer_id:id [ | from customers.csv ] | select order_id,name'
# Inner join (only matching rows)
cobble -s '| from orders.csv | join product_id, type=inner [ | from products.csv ]'
# Outer join (all rows from both sides)
cobble -s '| from left.csv | join id, type=outer [ | from right.csv ]'
# Expand join (one output row per match, like SQL)
cobble -s '| from students.csv | join class_id, target=expand [ | from enrollments.csv ]'
# Join with field selection
cobble -s '| from users.csv | join team_id [ | from teams.csv ] | select name,team_name'
Join types: left (default), inner, outer.
Join targets: first (default), last, expand, agg, agg_str.
Computed fields
# Add a new field (fields are available by name directly)
cat products.csv | cobble '| set margin="price - cost"'
# String manipulation
cat users.csv | cobble '| set domain="email.split(\"@\")[1]"'
# The value. prefix also works for dot-access on nested data
cat data.json | cobble '| set full="value.first + \" \" + value.last"'
Combining datasets
# Append rows from another file
cobble -s '| from jan.csv | append { | from feb.csv }'
Generating data
# Generate a numbered sequence
cobble -s '| range end=100 | set squared="i ** 2"'
Real-world examples
# Find top 5 customers by total spend
cobble -s '| from orders.csv | agg by=customer_id sum(total) count(total) | sort -total_sum | head 5'
# Join server metrics with inventory, filter to production
cobble -s '| from metrics.csv
| join hostname [ | from inventory.csv ]
| search env=prod
| sort -cpu
| select hostname,cpu,memory,team'
# Aggregate log counts by status code, show top errors
cat access.log.csv | cobble '| search status>=400 | agg by=status count(path) | sort -path_count'
# Compare two CSVs - find entries only in the second file
cobble -s '| from new.csv | join id, type=inner [ | from old.csv ] | select id,name'
Quoting rules
- Values with spaces or special characters should be quoted:
"my value" - Expressions containing operators should be quoted:
"Total Sales>100" - Escape quotes inside quoted strings with backslash:
"say \"hello\""
Command Reference
| Command | Aliases | Description |
|---|---|---|
from |
Read from file or stdin (-) |
|
search |
s |
Filter rows by field matching |
py |
where |
Filter rows with Python expressions |
select |
table |
Pick specific fields |
set |
eval |
Add/modify fields with Python expressions |
sort |
Sort by fields (-field for descending) |
|
agg |
aggregate, stats |
Group and aggregate |
join |
Join with another dataset | |
slice |
head, limit |
Limit row range |
rename |
Rename fields | |
unpack |
Expand list fields into rows | |
append |
Concatenate another pipeline | |
range |
Generate numbered rows |
Search Operators
| Operator | Example | Description |
|---|---|---|
= |
field=value |
Exact match (numeric-aware) |
!= |
field!=value |
Not equal |
> |
field>10 |
Greater than |
>= |
field>=10 |
Greater or equal |
< |
field<10 |
Less than |
<= |
field<=10 |
Less or equal |
~= |
field~=pat.* |
Regex match |
Numeric comparisons handle type coercion transparently -- string "25" from CSV is correctly compared as a number.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cobblequery-1.1.0.tar.gz
(17.7 kB
view details)
File details
Details for the file cobblequery-1.1.0.tar.gz.
File metadata
- Download URL: cobblequery-1.1.0.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda46ffae99666af620ed0df7e6de780cfac6d2f3ab3b135bf2a3e27cf9d9e9c
|
|
| MD5 |
5cd44af5773102b25ce2860a9b32ac21
|
|
| BLAKE2b-256 |
ca11b6040e1df2531554e2695c2084a3f4cc968ee4d77a1279009e1cfbffe6ba
|