Some CLI tools for text file processing
Project description
sodatools
A CLI toolkit for columnar text processing. Pipe-friendly, composable tools that operate on whitespace-delimited data from stdin.
pip install sodatools-core
Every tool is available both as a subcommand (soda <tool>) and as a
direct sd<tool> entry point (e.g. sdalign, sdkut, sduniq). The
sd prefix keeps the direct names out of the way of coreutils.
Tools
| Tool | Description |
|---|---|
align |
Buffer and align columns to fixed widths (auto right-justify numbers) |
coltest |
Filter lines by column test expressions (2gt50, 1seqOK, 3m^abc) |
cutw |
Truncate lines to terminal or explicit width |
delta |
Compute row-to-row differences for numeric columns |
events |
Convert per-row state labels into bounded events with start/stop times |
filter |
Filter noisy or invalid rows by column behavior (MAD, run-length, rate, range, clock, monotonicity) |
kut |
Select, reorder, and format columns (ranges, exclusions, literals) |
nf |
Enforce a specific number of fields per line (drop, pad, or truncate) |
plot |
Interactive scatter plots from columnar data (matplotlib) |
radix |
Convert integer columns between bases (dec, hex, bin, oct) |
sample |
Sample rows (every N, random, percent) or resample by time interval |
sample-data |
Generate sample datasets for testing (weather, small, noisy, etc.) |
smooth |
Sliding window mean/median smoothing for numeric columns |
stats |
Column statistics with formatted table output (mean, std, trend, outliers) |
tag |
Assign named states to rows based on column conditions |
uniq |
Consecutive deduplication with optional column keys and fuzzy matching |
Examples
Tag temperature readings into states and convert to events:
soda sample-data weather | soda tag -H1 hot: 2gt25 cold: 2lt15 normal: 2ge15 2le25 \
| soda events -H1 --gap 5m --print-header
state start stop duration count
hot 2024-01-01T04:35:00 2024-01-01T10:55:00 6.3h 77
normal 2024-01-01T11:00:00 2024-01-01T18:00:00 7.0h 85
cold 2024-01-01T18:05:00 2024-01-02T04:30:00 10.4h 126
...
Select columns, compute deltas, and align:
soda sample-data weather | soda kut 1 2 3 -H1 | soda delta 2 3 -H1 | soda align -H1
timestamp temp humidity
------------------- ---- --------
2024-01-01T00:05:00 0.03 -0.29
2024-01-01T00:10:00 0.12 0.51
2024-01-01T00:15:00 0.18 -0.87
...
Clean noisy sensor data — remove MAD outliers and short mode glitches, respecting time gaps:
soda sample-data noisy | soda filter 2:mad 3:run --gap 30s --drop | soda align
Sample every 10th row, compute statistics:
soda sample-data weather | soda sample every 10 -H1 | soda stats -H1
Convert hex register dumps to binary with aligned output:
echo -e "0xff\n0x0a\n0x80" | soda radix --from hex --to bin -b 8 --align
11111111
00001010
10000000
Filter lines where column 2 > 50, show only matching lines:
soda sample-data small | soda coltest 2gt50 -H1
Resample time series to 1-hour intervals, then plot temperature and humidity:
soda sample-data weather | soda sample interval 1h -H1 | soda plot -H1 2 c=red / 3 c=blue
Enforce 3 fields per line, pad short rows, smooth, and align:
soda sample-data small | soda nf 3 -H1 --pad '?' | soda smooth 3 -p 1 | soda align
Common flags
Most tools share these flags:
| Flag | Alias | Description |
|---|---|---|
--delimiter |
-d |
Input field delimiter (default: whitespace) |
--header |
-H |
First N lines are headers |
--align |
Adaptively align output columns | |
--no-fail |
Skip problematic rows instead of erroring | |
--example |
Show usage examples and exit |
Plugins
Third-party packages can add new subcommands to soda by registering them
under the sodatools.commands entry-point group. Core discovers plugins
at startup; any failures are reported to stderr without crashing the CLI.
# your plugin's pyproject.toml
[project.entry-points."sodatools.commands"]
mytool = "sodatools_myplugin.cli:mytool" # exposes `soda mytool`
[project.scripts]
sdmytool = "sodatools_myplugin.entrypoints:sdmytool" # optional direct entry
The registered object must be a typer-compatible command function with the
same shape as the built-ins (plain function with typer.Option /
typer.Argument defaults). Plugin names that collide with built-ins are
skipped with a warning.
Changelog
See CHANGELOG.md for release notes.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sodatools_core-0.0.3.tar.gz.
File metadata
- Download URL: sodatools_core-0.0.3.tar.gz
- Upload date:
- Size: 113.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae8ab07b290669dddd94e144187b3d635dfe5d19f33a29dfe683aa1aef0406e7
|
|
| MD5 |
954b686a884e8b9c6c83e943301e4889
|
|
| BLAKE2b-256 |
252c63271bc963ed38433a377dca564315c4e8dfdbea3ba94ff2f2f4d49a0349
|
File details
Details for the file sodatools_core-0.0.3-py3-none-any.whl.
File metadata
- Download URL: sodatools_core-0.0.3-py3-none-any.whl
- Upload date:
- Size: 69.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93d3f1323377762dd7a21c44abd106af0f5bbcef91421250de6ea7ae240a10eb
|
|
| MD5 |
2723cd787f360f1d9265fac895e25a59
|
|
| BLAKE2b-256 |
502b6fe6f902a42c09c83eed70d8bd3a11a1773e8ad9fa28f78e6e8b046db8ad
|