CLI tool for doing set operations (e.g. intersection, difference, union) on lines of input
Project description
kvenn
CLI Tool for doing set-operations on lines of input. Each line is treated as an item in a set. Each input is treated as a set.
Usage
usage: kvenn [-h] [-n] [-s] [-x] [--force-string-keys] [-f FORMAT]
[-o {+,-,x,d,union,difference,intersection,unique}]
sets [sets ...]
positional arguments:
sets Each file is a set and each line in the file is a
member of the set
optional arguments:
-h, --help show this help message and exit
-n, --non-empty non-empty values only
-s, --strip strip surrounding whitespace
-x, --filter strip and filter to non-empty
--force-string-keys JSON set keys should be forced to a string type
-f, --format FORMAT Output handler (csv,json/ndjson,text)
default=whatever your first input was
-o {+,-,x,d,union,difference,intersection,unique,stats}, --operation {+,-,x,d,union,difference,intersection,unique,stats}
Operation to perform on the sets [-] Subtract sets
1...N from set 0 [+] Get the union of sets 0...N [x]
Get the intersection of sets 0...N [d] Symmetric
difference (disjunctive union). Elements from all sets
which are not in any others. [stats] Print a summary
of all operations and per-source breakdowns.
Input Formats
kvenn supports three input formats. The format is detected from the file extension.
Plain text
Each line is treated as a set member. No special syntax needed.
kvenn file1.txt file2.txt
CSV
Use :: to specify which column(s) to use as the set key:
kvenn data1.csv::color data2.csv::color
Multiple key columns are supported:
kvenn data1.csv::id,color data2.csv::id,color
NDJSON (newline-delimited JSON)
Works the same as CSV — use :: to specify the key field(s):
kvenn data1.json::id data2.json::id
Nested keys use dot notation:
kvenn data1.json::meta.id data2.json::meta.id
Files with .json or .ndjson extensions are both supported.
Output format
By default the output format matches the first input file. Override with -f:
kvenn data1.csv::color data2.csv::color -f json
Examples
Unique values in a file
kvenn <input>
Unique values in two or more files (Also --operation union)
kvenn <input1> <input2> <inputN>
Values found in both files
kvenn <input1> <input2> --operation intersection
Values found in only one file
kvenn <input1> <input2> <inputN> --operation unique
Subtract values in B (and C, D.. etc) from A. (Unique values from A)
kvenn <inputA> <inputB> [<inputC>] --operation difference
Get a summary of all set operations at once
kvenn data_1.txt data_2.txt --operation stats
All (2 sources, 17 total unique items):
Union: 17 (e.g. Purple)
Intersection: 3 (e.g. Purple)
Difference (A - B): 7 (e.g. Teal)
Symmetric difference: 14 (e.g. Teal)
Source 1 - data_1.txt:
Total: 10
Unique: 7 (e.g. Teal)
Source 2 - data_2.txt:
Total: 10
Unique: 7 (e.g. Pink)
Development
make install-dev
make test
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file kvenn-2.0.0.tar.gz.
File metadata
- Download URL: kvenn-2.0.0.tar.gz
- Upload date:
- Size: 2.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab6b9bb02b6102af1bc7e5041a0caf2119ae5a9e0322ef9c74ef6be4e1be33d6
|
|
| MD5 |
4eb417c76e9d3f51ecac8ba3c6b5506f
|
|
| BLAKE2b-256 |
ead267f4e83d8218db61445d1ca709015739201fa8df5ae987463bff450394d2
|