Skip to main content

CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.

Project description

CSV GP: Diagnose all your CSV issues

CSVs are a ubiquitous format for data transfer that are commonly riddled with issues. Most CSV libraries abort with an unhelpful error, CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.

Installation

CSV GP can be used in three ways.

Standalone binary

  1. Install rust
  2. Clone the repo and navigate into it
  3. Run cargo install --path csv_gp
  4. The csv-gp command will now be available to run, please see csv-gp --help for usage

Rust library

Add the following to your Cargo.toml:

csv-gp = { git = "https://github.com/xelixdev/csv-gp", rev = "<optional git tag>" }

Python library

From package manager

The library is available on PyPI, at https://pypi.org/project/csv-gp/ so you can just run:

pip install csv-gp

Compiling from source

  1. Install rust
  2. Install (pip install maturin)
  3. Clone the repo
  4. Run make all
  5. cd csv_gp_python && maturin develop

Usage

Rust standalone binary

After installing the binary, the default usage is running csv-gp $FILE. This will print a diagnosis of the file. The command provides options to change the delimiter and the encoding of the file. See csv-gp -h for details.

Another option provided is --correct-rows-path which will export only the correct rows to the provided path.

Python library

The python library exposes two main functions, check_file and get_rows.

The check file function takes a path to file, the delimiter and the encoding (see https://github.com/xelixdev/csv-gp/blob/0f77c62841509c134a3bbe06ec178426e9c5aa10/csv_gp_python/csv_gp.pyi) and returns an instance of a class CSVDetails which provides details about the file. See the same file to see all the available attributes and their names/types. If the valid_rows_output_path argument is provided to the function, only the correct rows will be exported to that path.

The get_rows once again takes a path to file, the delimiter and the encoding and additionally a list of row numbers. The function will then return the parsed cells for given rows. See the above file for the exact typing of the parameter and returned values.

Releasing a new version of the Python lib

  1. Update version numbers in csv_gp_python/Cargo.toml, csv_go/Cargo.toml, and csv_gp_python/pyproject.toml
  2. Run cargo check to update the lock files with new versions
  3. Merge this change into main
  4. Create a new release on GitHub, creating a tag in the form vX.Y.Z
  5. The 'Publish' pipeline should begin running, and the new version will be published

Running tests

Running Rust tests

Run cargo test.

Running Python tests

Follow the instructions on compiling from source. Then you can run pytest.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv_gp-0.4.0.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (479.4 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

File details

Details for the file csv_gp-0.4.0.tar.gz.

File metadata

  • Download URL: csv_gp-0.4.0.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for csv_gp-0.4.0.tar.gz
Algorithm Hash digest
SHA256 98ba7bc4e8b5faa42d984c79d9dc0d3487f82158ad35ad8cc38a22c5b68b60e0
MD5 ffb106dc21dec3a35384448b511560c3
BLAKE2b-256 9ab247d4dec3cef1b1acd928c5f2c6bc2872996e98bd75cf9f9d8b700a827415

See more details on using hashes here.

Provenance

The following attestation bundles were made for csv_gp-0.4.0.tar.gz:

Publisher: publish.yml on xelixdev/csv-gp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 68e7c2b9bc2ab59753c6b8da383f36a1e7ca6704ac4c397b60577795df5819b4
MD5 d4af510bdc1b1d2482bd56399172af73
BLAKE2b-256 2e7841c0e1a7f8c8b06fca2f94e45b2b04d6d8c6c82bd5807e2e093f5d3633b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on xelixdev/csv-gp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page