Skip to main content

CLI for fast, flexbile concatenation of tabular data using polars.

Project description

PyPi CI GitHub stars

joinem provides a CLI for fast, flexbile concatenation of tabular data using polars

Install

python3 -m pip install joinem

Features

  • Lazily streams I/O to expeditiously handle numerous large files.
  • Supports CSV and parquet input files.
    • Due to current polars limitations, JSON and feather files are not supported.
    • Input formats may be mixed.
  • Supports output to CSV, JSON, parquet, and feather file types.
  • Allows mismatched columns and/or empty data files with --how diagonal and --how diagonal_relaxed.
  • Provides a progress bar with --progress.

Example Usage

Pass input filenames via stdin, one filename per line.

find path/to/*.parquet path/to/*.csv | python3 -m joinem -o out.parquet

Output file type is inferred from the extension of the output file name. Supported output types are feather, JSON, parquet, and csv.

find -name '*.parquet' | python3 -m joinem -o out.json

Use --progress to show a progress bar.

ls -1 path/{*.csv,*.pqt} | python3 -m joinem -o out.csv --progress

If file columns may mismatch, use --how diagonal.

find path/to/ -name '*.csv' | python3 -m joinem -o out.csv --how diagonal

If some files may be empty, use --how diagonal_relaxed.

To run via Singularity/Apptainer,

find path/to/ -name '*.csv' | singularity run docker://ghcr.io/mmore500/joinem -o out.feather

API

usage: __main__.py [-h] [--version] [--progress]
                   [--how {vertical,horizontal,diagonal,diagonal_relaxed}]
                   output_file

Concatenate CSV and/or parquet tabular data files.

positional arguments:
  output_file           Output file name

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --progress            Show progress bar
  --how {vertical,horizontal,diagonal,diagonal_relaxed}
                        How to concatenate frames. See <https://docs.pola.rs/py-
                        polars/html/reference/api/polars.concat.html> for more information.

Provide input filenames via stdin. Example: find path/to/ -name '*.csv' | python3 -m joinem
-o out.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

joinem-0.1.4.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

joinem-0.1.4-py2.py3-none-any.whl (4.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file joinem-0.1.4.tar.gz.

File metadata

  • Download URL: joinem-0.1.4.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for joinem-0.1.4.tar.gz
Algorithm Hash digest
SHA256 38aee54bd9f400ec1c05864694167d78bae1f09f773325321e99691d87ce1425
MD5 d9ab676b46ce867eda7684f5c7e0a120
BLAKE2b-256 3840219fc1d25dd831d88fd38ede7d13b1eaf961f012b32ba6f2d144133d4908

See more details on using hashes here.

File details

Details for the file joinem-0.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: joinem-0.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for joinem-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ee3a580c648f3dded7fc6b17f950dbb6b3e99eed3ae82914188847b1bccd89a0
MD5 f72ce723101b39f6a67524c3fdba39d1
BLAKE2b-256 d1988485dd2a6958484a27cfbf8bc08f1f1667f23f656eb382e67ec45a24e136

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page