Skip to main content

CLI tool for doing set operations (e.g. intersection, difference, union) on lines of input

Project description

kvenn

CLI Tool for doing set-operations on lines of input. Each line is treated as an item in a set. Each input is treated as a set.

Usage

usage: kvenn [-h] [-n] [-s] [-x] [--force-string-keys] [-f FORMAT]
             [-o {+,-,x,d,union,difference,intersection,unique}]
             sets [sets ...]

positional arguments:
  sets                  Each file is a set and each line in the file is a
                        member of the set

optional arguments:
  -h, --help            show this help message and exit
  -n, --non-empty       non-empty values only
  -s, --strip           strip surrounding whitespace
  -x, --filter          strip and filter to non-empty
  --force-string-keys   JSON set keys should be forced to a string type
  -f, --format FORMAT   Output handler (csv,json/ndjson,text)
                        default=whatever your first input was
  -o {+,-,x,d,union,difference,intersection,unique,stats}, --operation {+,-,x,d,union,difference,intersection,unique,stats}
                        Operation to perform on the sets [-] Subtract sets
                        1...N from set 0 [+] Get the union of sets 0...N [x]
                        Get the intersection of sets 0...N [d] Symmetric
                        difference (disjunctive union). Elements from all sets
                        which are not in any others. [stats] Print a summary
                        of all operations and per-source breakdowns.

Input Formats

kvenn supports three input formats. The format is detected from the file extension.

Plain text

Each line is treated as a set member. No special syntax needed.

kvenn file1.txt file2.txt

CSV

Use :: to specify which column(s) to use as the set key:

kvenn data1.csv::color data2.csv::color

Multiple key columns are supported:

kvenn data1.csv::id,color data2.csv::id,color

NDJSON (newline-delimited JSON)

Works the same as CSV — use :: to specify the key field(s):

kvenn data1.json::id data2.json::id

Nested keys use dot notation:

kvenn data1.json::meta.id data2.json::meta.id

Files with .json or .ndjson extensions are both supported.

Output format

By default the output format matches the first input file. Override with -f:

kvenn data1.csv::color data2.csv::color -f json

Examples

Unique values in a file

kvenn <input>

Unique values in two or more files (Also --operation union)

kvenn <input1> <input2> <inputN>

Values found in both files

kvenn <input1> <input2> --operation intersection

Values found in only one file

kvenn <input1> <input2> <inputN> --operation unique

Subtract values in B (and C, D.. etc) from A. (Unique values from A)

kvenn <inputA> <inputB> [<inputC>] --operation difference

Get a summary of all set operations at once

kvenn data_1.txt data_2.txt --operation stats

All (2 sources, 17 total unique items):
  Union:                       17    (e.g. Purple)
  Intersection:                 3    (e.g. Purple)
  Difference (A - B):           7    (e.g. Teal)
  Symmetric difference:        14    (e.g. Teal)

Source 1 - data_1.txt:
  Total:        10
  Unique:        7    (e.g. Teal)

Source 2 - data_2.txt:
  Total:        10
  Unique:        7    (e.g. Pink)

Development

make install-dev
make test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvenn-2.0.0.tar.gz (2.6 kB view details)

Uploaded Source

File details

Details for the file kvenn-2.0.0.tar.gz.

File metadata

  • Download URL: kvenn-2.0.0.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for kvenn-2.0.0.tar.gz
Algorithm Hash digest
SHA256 ab6b9bb02b6102af1bc7e5041a0caf2119ae5a9e0322ef9c74ef6be4e1be33d6
MD5 4eb417c76e9d3f51ecac8ba3c6b5506f
BLAKE2b-256 ead267f4e83d8218db61445d1ca709015739201fa8df5ae987463bff450394d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page