Faster parquet metadata reading
Project description
PalletJack
How to use:
import palletjack as pj
import pyarrow.parquet as pq
import polars as pl
import numpy as np
rows = 5
columns = 10
chunk_size = 1 # A row group per
path = "my.parquet"
table = pl.DataFrame(
data=np.random.randn(rows, columns),
schema=[f"c{i}" for i in range(columns)]).to_arrow()
pq.write_table(table, path, row_group_size=chunk_size, use_dictionary=False, write_statistics=False, store_schema=False)
# Reading using the original metadata
pr = pq.ParquetReader()
pr.open(path)
res_data = pr.read_row_groups([i for i in range(pr.num_row_groups)], column_indices=[0,1,2], use_threads=False)
print (res_data)
# Reading using the indexed metadata
index_path = path + '.index'
pj.generate_metadata_index(path, index_path)
for r in range(0, rows):
metadata = pj.read_row_group_metadata(index_path, r)
pr = pq.ParquetReader()
pr.open(path, metadata=metadata)
res_data = pr.read_row_groups([0], column_indices=[0,1,2], use_threads=False)
print (res_data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
palletjack-0.1.2.tar.gz
(86.7 kB
view hashes)
Built Distributions
Close
Hashes for palletjack-0.1.2-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc2b52eff0dd3725de1a974cbecedc6a3141860238b9dea838fad3d18d78f22d |
|
MD5 | aec63f07435da57ce28ee7bcef6bc192 |
|
BLAKE2b-256 | bb6049f24c3deb21be281ae53e30677ac082f18d69abd590e7345ab5dac8a88e |
Close
Hashes for palletjack-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79f73c2e102d33aadd94f14b25c0e9f85653ada87ebaa69f1d107d507fc43652 |
|
MD5 | 5d3e66e9ce142ab1594510d39c48d079 |
|
BLAKE2b-256 | cbd5ee9f26753b9f8f6dcd4ac8e1ffbd0d8113e62ed2c337f362ac8f6c845d78 |
Close
Hashes for palletjack-0.1.2-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbaafcfdad3c3325b915b9241385dafdc34701b6ad89d4a70b669ced4e34e3a2 |
|
MD5 | 5bd479c08681b28b092b60a077edf9d7 |
|
BLAKE2b-256 | a5f5460c7e6e7d5dbdddb972ea8b273e7899750bfd6dc244d09b5671ad050c73 |
Close
Hashes for palletjack-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | caa157a9c04a31ca1919237738edc33752a212af0072b77eb5d416340de5403d |
|
MD5 | be603df136030c6bd7165e94e7b36f8a |
|
BLAKE2b-256 | 1caf82f04df551235b8d875c877056ddb4a4ce108ae1a5c2d7a684daad301ff1 |
Close
Hashes for palletjack-0.1.2-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf29f0b6db613c7e8e2325fc3681a16ffc2029e04d61a415bbbe15e0af4f744d |
|
MD5 | 29148389f6eefbf5947cc7c79508b532 |
|
BLAKE2b-256 | 95f00fe00faf74255f8c57bed3c0bbf3dede7515ab36ea45d93da8c041229b00 |
Close
Hashes for palletjack-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1481e4342515826ab1471bd9f621691388e9e7893d3411b1761d864efe145467 |
|
MD5 | 86fea686923df88b3a7d36053190d6e0 |
|
BLAKE2b-256 | 4a1a22102211d922bec72ea3461958df1ace1684830fa90ec9ca3448cee11a0a |
Close
Hashes for palletjack-0.1.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54e2ffbbf2722ffcccb764f3d0ef8ad513e6c3a70fcf9fe7bd5afac3ad45e85c |
|
MD5 | 64cae2381b70bff62d091bb1653f7b5d |
|
BLAKE2b-256 | e328d1b7f812c8bb65903806cc6b69d2014b7d870d9596458d4c1de7df429fea |
Close
Hashes for palletjack-0.1.2-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc200340f4ca4669ed17018d6377af72a553ca346a74109e7cf3212e1ba57597 |
|
MD5 | 2d0aa38254e5fba19eaea472072ff0a7 |
|
BLAKE2b-256 | 40f97bb464302d422f8507f6c41da0022f32094ef44a23ab98d0c4cd4e7305d0 |