Faster parquet metadata reading
Project description
PalletJack
How to use:
import palletjack as pj
import pyarrow.parquet as pq
import polars as pl
import numpy as np
rows = 5
columns = 10
chunk_size = 1 # A row group per
path = "my.parquet"
table = pl.DataFrame(
data=np.random.randn(rows, columns),
schema=[f"c{i}" for i in range(columns)]).to_arrow()
pq.write_table(table, path, row_group_size=chunk_size, use_dictionary=False, write_statistics=False, store_schema=False)
# Reading using the original metadata
pr = pq.ParquetReader()
pr.open(path)
res_data = pr.read_row_groups([i for i in range(pr.num_row_groups)], column_indices=[0,1,2], use_threads=False)
print (res_data)
# Reading using the indexed metadata
index_path = path + '.index'
pj.generate_metadata_index(path, index_path)
for r in range(0, rows):
metadata = pj.read_row_group_metadata(index_path, r)
pr = pq.ParquetReader()
pr.open(path, metadata=metadata)
res_data = pr.read_row_groups([0], column_indices=[0,1,2], use_threads=False)
print (res_data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
palletjack-0.1.1.tar.gz
(86.7 kB
view hashes)
Built Distributions
Close
Hashes for palletjack-0.1.1-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3352c0ddb608f307d3844583c27f9c8466c73d4923c1aa027a7d5499d1e9af59 |
|
MD5 | a3e6549fd91cbaff88cfbd150e9a95a7 |
|
BLAKE2b-256 | c7b302999bde6b0eb86b3345ea161c4515430b33dd02793b27358de19f5d395d |
Close
Hashes for palletjack-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 731b0b6c2398e92e0925ca2c9a3a57e8849130bec36d65d44f759cbc3975b3f6 |
|
MD5 | 58185dadf662c3d8439384dcb08a82f9 |
|
BLAKE2b-256 | 76ad486fbbb9d6b2bd5c7a23cb2cad51b68e72ef9d26acbdc989e2e20b96e094 |
Close
Hashes for palletjack-0.1.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e923742c6e11004660b81ebdcbd6715be426c187f94a731321213e1463f11c8e |
|
MD5 | 9f0d19589524453799641d7d73674412 |
|
BLAKE2b-256 | 511c9743f2cc61d479b1e43634df5ef72aee67134654d0eed2621d07715df4cb |
Close
Hashes for palletjack-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1641d4ce0df27bd7ef2bf03292ce3c572804c02274c391ab82e86b8a1de6c35b |
|
MD5 | f69315e6ef04cb3f92dd8ebd875eb4e6 |
|
BLAKE2b-256 | 57ebd34a627304665450e2ed59adc62f7d6c52d650775b89102c5b20f15658ce |
Close
Hashes for palletjack-0.1.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89bd7b9ce8f05792eb03eb08245f0d1f7fe0b42dee294e8e3949feaac144f4f0 |
|
MD5 | 1835aaf6fbe44eb85e86d4948ff177ee |
|
BLAKE2b-256 | b97029ea51eb483cd5017dc5606c88976f026916aee1108bb390c8619755a9c9 |
Close
Hashes for palletjack-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03802fbc71afdb32068f8cb74218c66791161f72d05bd288b6f08203d838c5c3 |
|
MD5 | 8baab9613c33a9736976f5c5de763ac8 |
|
BLAKE2b-256 | f12cac837138eb3a230bc3c4ce09a53984042f22657f57bc847a1a8765019380 |
Close
Hashes for palletjack-0.1.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec181302937aa4be84f0f0859ae554da8b69b9706017c5603ad327d3040f39bc |
|
MD5 | 0c5e1baf2763bac208a390ff22c4bb56 |
|
BLAKE2b-256 | 28c46e4898ba6ce508ba2a28bd68079d74fb5a784a1a6c19e9f09e193e80eb00 |
Close
Hashes for palletjack-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3892010045df3ee753823d7391a3075e89aa5d5e335f5e2f3d7016e627eb41fb |
|
MD5 | 0c89808b70d952f5008bd8edd507533b |
|
BLAKE2b-256 | e3265187ec869258420e9a2222ce6e36b2eac3d82d6600e2482b49445ac7b446 |