Faster parquet metadata reading
Project description
PalletJack
How to use:
import palletjack as pj
import pyarrow.parquet as pq
import polars as pl
import numpy as np
rows = 5
columns = 10
chunk_size = 1 # A row group per
path = "my.parquet"
table = pl.DataFrame(
data=np.random.randn(rows, columns),
schema=[f"c{i}" for i in range(columns)]).to_arrow()
pq.write_table(table, path, row_group_size=chunk_size, use_dictionary=False, write_statistics=False, store_schema=False)
# Reading using the original metadata
pr = pq.ParquetReader()
pr.open(path)
res_data = pr.read_row_groups([i for i in range(pr.num_row_groups)], column_indices=[0,1,2], use_threads=False)
print (res_data)
# Reading using the indexed metadata
index_path = path + '.index'
pj.generate_metadata_index(path, index_path)
for r in range(0, rows):
metadata = pj.read_row_group_metadata(index_path, r)
pr = pq.ParquetReader()
pr.open(path, metadata=metadata)
res_data = pr.read_row_groups([0], column_indices=[0,1,2], use_threads=False)
print (res_data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
palletjack-0.0.9.tar.gz
(86.7 kB
view hashes)
Built Distributions
Close
Hashes for palletjack-0.0.9-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcf06b263b6f2ac57f95d3c6799c841a5b0cf5aa01a42851c666b8ea9cac2b20 |
|
MD5 | e18c02b26f167c4117ee4c33ccdee5f6 |
|
BLAKE2b-256 | fa5b7760090ccf76e666cb2b61aa8465e3f044b7d51dad02564e8ac0ffe6ed44 |
Close
Hashes for palletjack-0.0.9-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed07f68daa5c4e0c3d41dd2b4b52432c372b6c07f03509a4c7dc21506d4b32de |
|
MD5 | 475ee79b57b0e6e56c984205608e579c |
|
BLAKE2b-256 | f72c4a4be0d6fc11cb7a12ee71535af4f35a1d65efd8e7df57877b7723ba5905 |
Close
Hashes for palletjack-0.0.9-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3d207504b1f0e9facf7191639499f8068e1484fa5592760b882a088cdf55b3a |
|
MD5 | 5846b938af6094e05868b6ae7b5a61be |
|
BLAKE2b-256 | 0db36842f86d35ec451c6e2e6dc53be59f663f34e7774a1d8253be6a534c53ee |
Close
Hashes for palletjack-0.0.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5555f1ad960e546d05a2edf3bb851788df8b5779f6fe37230142730fd390ff7c |
|
MD5 | 2da710219808d479e33cfa8710a4a84d |
|
BLAKE2b-256 | 5bfd53e150b91fc79733011bdcdd7edf455e057cbd4f3bec251ea8ae746f9501 |
Close
Hashes for palletjack-0.0.9-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0cc9b32578a7eddc9d36c0ff9945eb44929ec831b5a145b34fa0e1ccdde354b |
|
MD5 | b4f4959db8c0f7b1e0e89dc54bb6b204 |
|
BLAKE2b-256 | fd16baf8f7cc17c047a7b2b8fc6f60cce1c0b6e00906ca624e304112ed509575 |
Close
Hashes for palletjack-0.0.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcb1833e13fd463d08209f243861f4facb742174cc492aa16958524e19fd31d9 |
|
MD5 | 4cef92c12eaf28380a7df953c7fc96fc |
|
BLAKE2b-256 | 1bb668f7a45cbf999054e10226fe13491c69a971c398309e21ab1c5d382468e6 |
Close
Hashes for palletjack-0.0.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1dec4c6d73c865820755d8c59b787e30512151fd751024916bbcaf0141def504 |
|
MD5 | ee02a45eb432876549880d3d51139135 |
|
BLAKE2b-256 | 163e7f3daf6481bc0b538862d2b707dc83c68dc00d786e6a671df59ea37551db |