Skip to main content

CLI tool for doing data joining

Project description

Cobble

A CLI tool for ad-hoc data joining, filtering, and analysis. Pipe in CSV, JSON, or plain text and query it with a shell-friendly syntax.

Install

pip install cobblequery

Quick Start

# Pipe CSV and filter
cat users.csv | cobble '| search role=admin | select name,email'

# Read a file directly (skip stdin)
cobble -s '| from data.csv | head 5'

# Chain JSON lines from another tool
kubectl get pods -o json | cobble '| select name,status'

CLI Usage

usage: cobble [-h] [--dry-run] [-i] [-s] [-q QUERY_FILE] [query]

positional arguments:
  query                 Query to run

optional arguments:
  -h, --help            show this help message and exit
  --dry-run             Parse and validate query without running
  -i, --interactive     Do an interactive edit of the query
  -s, --no-stdin        Dont insert the stdin generator
  -q QUERY_FILE, --query-file QUERY_FILE
                        Read query text from a file

Queries are pipe-delimited chains of commands:

| command1 args | command2 args | ...

Examples

Filtering rows

# Exact match
cat servers.csv | cobble '| search env=prod'

# Numeric comparison
cat orders.csv | cobble '| search total>100'

# Multiple conditions (AND)
cat orders.csv | cobble '| search status=shipped,total>=50'

# Regex match
cat logs.csv | cobble '| search path~=/api/v[23]'

# Not equal
cat users.csv | cobble '| search role!=guest'

# Python expression for complex logic
cat data.csv | cobble '| py "age > 21 and status == \"active\""'

Selecting and renaming fields

# Pick specific fields
cat users.csv | cobble '| select name,email,role'

# Rename a field
cat data.csv | cobble '| rename new_name=old_name'

Sorting

# Sort ascending
cat scores.csv | cobble '| sort name'

# Sort descending
cat scores.csv | cobble '| sort -score'

# Multi-field sort
cat employees.csv | cobble '| sort department,-salary'

Limiting results

# First 10 rows
cat huge.csv | cobble '| head 10'

# Rows 5 through 15
cat data.csv | cobble '| slice 5,15'

Aggregation

# Count and sum with auto-named output fields (field_operation)
cat sales.csv | cobble '| agg by=region sum(revenue) count(id)'
# Output: {"region": "west", "revenue_sum": 48000, "id_count": 12}

# Named output fields
cat sales.csv | cobble '| agg by=region, total=sum(revenue), n=count(id)'
# Output: {"region": "west", "total": 48000, "n": 12}

# No group-by (aggregate everything)
cat sales.csv | cobble '| agg total=sum(revenue)'

# Multiple group-by fields
cat data.csv | cobble '| agg by=year,quarter avg(revenue) min(cost) max(cost)'

Available aggregation functions: sum, count, avg, min, max, first, last, dc (distinct count), values, unique_values.

Joining datasets

# Join users with their departments (left join, first match)
cobble -s '| from users.csv | join dept_id [ | from departments.csv ] | select name,dept_name'

# Join with different key names (source_key:target_key)
cobble -s '| from orders.csv | join customer_id:id [ | from customers.csv ] | select order_id,name'

# Inner join (only matching rows)
cobble -s '| from orders.csv | join product_id, type=inner [ | from products.csv ]'

# Outer join (all rows from both sides)
cobble -s '| from left.csv | join id, type=outer [ | from right.csv ]'

# Expand join (one output row per match, like SQL)
cobble -s '| from students.csv | join class_id, target=expand [ | from enrollments.csv ]'

# Join with field selection
cobble -s '| from users.csv | join team_id [ | from teams.csv ] | select name,team_name'

Join types: left (default), inner, outer. Join targets: first (default), last, expand, agg, agg_str.

Computed fields

# Add a new field (fields are available by name directly)
cat products.csv | cobble '| set margin="price - cost"'

# String manipulation
cat users.csv | cobble '| set domain="email.split(\"@\")[1]"'

# The value. prefix also works for dot-access on nested data
cat data.json | cobble '| set full="value.first + \" \" + value.last"'

Combining datasets

# Append rows from another file
cobble -s '| from jan.csv | append { | from feb.csv }'

Generating data

# Generate a numbered sequence
cobble -s '| range end=100 | set squared="i ** 2"'

Real-world examples

# Find top 5 customers by total spend
cobble -s '| from orders.csv | agg by=customer_id sum(total) count(total) | sort -total_sum | head 5'

# Join server metrics with inventory, filter to production
cobble -s '| from metrics.csv
    | join hostname [ | from inventory.csv ]
    | search env=prod
    | sort -cpu
    | select hostname,cpu,memory,team'

# Aggregate log counts by status code, show top errors
cat access.log.csv | cobble '| search status>=400 | agg by=status count(path) | sort -path_count'

# Compare two CSVs - find entries only in the second file
cobble -s '| from new.csv | join id, type=inner [ | from old.csv ] | select id,name'

Quoting rules

  • Values with spaces or special characters should be quoted: "my value"
  • Expressions containing operators should be quoted: "Total Sales>100"
  • Escape quotes inside quoted strings with backslash: "say \"hello\""

Command Reference

Command Aliases Description
from Read from file or stdin (-)
search s Filter rows by field matching
py where Filter rows with Python expressions
select table Pick specific fields
set eval Add/modify fields with Python expressions
sort Sort by fields (-field for descending)
agg aggregate, stats Group and aggregate
join Join with another dataset
slice head, limit Limit row range
rename Rename fields
unpack Expand list fields into rows
append Concatenate another pipeline
range Generate numbered rows

Search Operators

Operator Example Description
= field=value Exact match (numeric-aware)
!= field!=value Not equal
> field>10 Greater than
>= field>=10 Greater or equal
< field<10 Less than
<= field<=10 Less or equal
~= field~=pat.* Regex match

Numeric comparisons handle type coercion transparently -- string "25" from CSV is correctly compared as a number.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cobblequery-1.1.0.tar.gz (17.7 kB view details)

Uploaded Source

File details

Details for the file cobblequery-1.1.0.tar.gz.

File metadata

  • Download URL: cobblequery-1.1.0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for cobblequery-1.1.0.tar.gz
Algorithm Hash digest
SHA256 fda46ffae99666af620ed0df7e6de780cfac6d2f3ab3b135bf2a3e27cf9d9e9c
MD5 5cd44af5773102b25ce2860a9b32ac21
BLAKE2b-256 ca11b6040e1df2531554e2695c2084a3f4cc968ee4d77a1279009e1cfbffe6ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page