Utilities to help build Workbench modules
Project description
Parquet tools for CJWorkbench.
Workbench modules may optionally depend on the latest version of this Python
package for its cjwparquet.api.*
functions.
Installation
This is meant to be used within a Docker container. It depends on executables
/usr/bin/parquet-to-arrow
and /usr/bin/parquet-to-text-stream
.
Your Dockerfile might look something like this:
FROM workbenchdata/parquet-tools:v2.1.0 AS parquet-tools
FROM python:3.8.5-buster AS main
COPY --from=parquet-tools /usr/bin/parquet-to-arrow /usr/bin/parquet-to-arrow
COPY --from=parquet-tools /usr/bin/parquet-to-text-stream /usr/bin/parquet-to-text-stream
# And now that these binaries are accessible, you can install cjwparquet...
Usage
from pathlib import Path
import cjwparquet
import pyarrow
# Write a Parquet file
cjwparquet.write(Path("test.parquet"), pyarrow.table({"A": ["foo", "bar"]}))
# Test whether a file looks like a Parquet file
if cjwparquet.file_has_parquet_magic_number(Path("test.parquet")):
# Read a Parquet file
with cjwparquet.open_as_mmapped_arrow(Path("test.parquet")) as table:
assert table.to_pydict() == {"A": ["foo", "bar"]}
# Convert to text
text = cjwparquet.read_slice_as_text(
Path("test.parquet"),
format="csv",
only_columns=range(0, 20),
only_rows=range(0, 200),
)
assert text == "A\nfoo\nbar"
Developing
- Run tests:
docker build .
- Write a failing unit test in
tests/
- Make it pass by editing code in
cjwparquet/
black cjwparquet tests && isort cjwparquet tests
- Submit a pull request
Be very, very, very careful to preserve a consistent API. Workbench will upgrade this dependency without module authors' explicit consent. Add new features; fix bugs. Never change functionality.
Publishing
- Write a new
version=
tosetup.py
. Adhere to semver. (As changes must be backwards-compatible, the version will always start with1
and look like1.x.y
.) - Prepend notes to
CHANGELOG.md
about the new version git commit
git tag v1.x.y
git push --tags && git push
- Wait for Travis to push our changes to PyPI
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cjwparquet-2.2.1.tar.gz
(5.6 kB
view details)
File details
Details for the file cjwparquet-2.2.1.tar.gz
.
File metadata
- Download URL: cjwparquet-2.2.1.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1d21c183d967282ef6644c26120902d2faa84c79273e138f69154f563465289 |
|
MD5 | eca260de549f0675719a79d01d2f6a63 |
|
BLAKE2b-256 | 25e5a6dba269eec253d64fcfefeb5d3638125b0db09167dc05b723cdc07330c6 |