CLI for fast, flexbile concatenation of tabular data using polars.
Project description
joinem provides a CLI for fast, flexbile concatenation of tabular data using polars
- Free software: MIT license
- Repository: https://github.com/mmore500/joinem
Install
python3 -m pip install joinem
Features
- Lazily streams I/O to expeditiously handle numerous large files.
- Supports CSV and parquet input files.
- Due to current polars limitations, JSON and feather files are not supported.
- Input formats may be mixed.
- Supports output to CSV, JSON, parquet, and feather file types.
- Allows mismatched columns and/or empty data files with
--how diagonal
and--how diagonal_relaxed
. - Provides a progress bar with
--progress
.
Example Usage
Pass input filenames via stdin, one filename per line.
find path/to/*.parquet path/to/*.csv | python3 -m joinem -o out.parquet
Output file type is inferred from the extension of the output file name. Supported output types are feather, JSON, parquet, and csv.
find -name '*.parquet' | python3 -m joinem -o out.json
Use --progress
to show a progress bar.
ls -1 path/{*.csv,*.pqt} | python3 -m joinem -o out.csv --progress
If file columns may mismatch, use --how diagonal
.
find path/to/ -name '*.csv' | python3 -m joinem -o out.csv --how diagonal
If some files may be empty, use --how diagonal_relaxed
.
To run via Singularity/Apptainer,
find path/to/ -name '*.csv' | singularity run docker://ghcr.io/mmore500/joinem -o out.feather
API
usage: __main__.py [-h] [--version] [--progress]
[--how {vertical,horizontal,diagonal,diagonal_relaxed}]
output_file
Concatenate CSV and/or parquet tabular data files.
positional arguments:
output_file Output file name
options:
-h, --help show this help message and exit
--version show program's version number and exit
--progress Show progress bar
--how {vertical,horizontal,diagonal,diagonal_relaxed}
How to concatenate frames. See <https://docs.pola.rs/py-
polars/html/reference/api/polars.concat.html> for more information.
Provide input filenames via stdin. Example: find path/to/ -name '*.csv' | python3 -m joinem
-o out.csv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
joinem-0.1.4.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for joinem-0.1.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee3a580c648f3dded7fc6b17f950dbb6b3e99eed3ae82914188847b1bccd89a0 |
|
MD5 | f72ce723101b39f6a67524c3fdba39d1 |
|
BLAKE2b-256 | d1988485dd2a6958484a27cfbf8bc08f1f1667f23f656eb382e67ec45a24e136 |