Faster parquet metadata reading
Project description
PalletJack
How to use:
import palletjack as pj
import pyarrow.parquet as pq
import polars as pl
import numpy as np
rows = 5
columns = 10
chunk_size = 1 # A row group per
path = "my.parquet"
table = pl.DataFrame(
data=np.random.randn(rows, columns),
schema=[f"c{i}" for i in range(columns)]).to_arrow()
pq.write_table(table, path, row_group_size=chunk_size, use_dictionary=False, write_statistics=False, store_schema=False)
# Reading using the original metadata
pr = pq.ParquetReader()
pr.open(path)
res_data = pr.read_row_groups([i for i in range(pr.num_row_groups)], column_indices=[0,1,2], use_threads=False)
print (res_data)
# Reading using the indexed metadata
index_path = path + '.index'
pj.generate_metadata_index(path, index_path)
for r in range(0, rows):
metadata = pj.read_row_group_metadata(index_path, r)
pr = pq.ParquetReader()
pr.open(path, metadata=metadata)
res_data = pr.read_row_groups([0], column_indices=[0,1,2], use_threads=False)
print (res_data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
palletjack-0.2.1.tar.gz
(89.2 kB
view hashes)
Built Distributions
Close
Hashes for palletjack-0.2.1-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 011bd82da7cd7e1497dcc36ccb3778471fe3bd14c8a4a5835e07b3642ef963e0 |
|
MD5 | a7bb4e4f805c8e377cb966aed81faed9 |
|
BLAKE2b-256 | cce54050a2da0212790dd81a94825cc4a304cd44314db21d846f5f2115b5d572 |
Close
Hashes for palletjack-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 332762bfcd850dd8971055cadc3f55fdf0ed39be3efef0629973ce7b0378958c |
|
MD5 | 54e1d0f826efdb2c6d15381387fbe7db |
|
BLAKE2b-256 | 697d64a28f914d94e6cdd4b220d5ac8cca52521f85d3b05fecb8d8a95710fb84 |
Close
Hashes for palletjack-0.2.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec1c11d8cd8a5885c9af8dcfb3dac549611aed4350d9f9656759e2f4e2ce6014 |
|
MD5 | fda378bfaf37c10c19aaaaf6f7e14b5c |
|
BLAKE2b-256 | da2ae2e680ece2c1c09c508e81c7685dc7e40888f41382a9c11fc3654450ac13 |
Close
Hashes for palletjack-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f376ba973b06a7d780df025291c00945a6564589cfe33e4bb2606a015222e2a |
|
MD5 | be12cad820a9c9ba877b949a4575cc1b |
|
BLAKE2b-256 | f1bdbc0006947a564bb04ede0e39bf5bd016a2e41da11f940a519ff57066d310 |
Close
Hashes for palletjack-0.2.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79217ff78cbccea868fd72be94534386a0a64f1d35bd28e5aede082d7bd29b28 |
|
MD5 | 02b7c3ad844b9387ce84a5da9d37ef44 |
|
BLAKE2b-256 | 9ea37beac55770eb655d652392cc3be41c36cf800757f139766c4b9cfa3ffb19 |
Close
Hashes for palletjack-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 893329de384558beafaf8b43c057b4375432f6413ded6845f48e3dc869ad05d7 |
|
MD5 | b649efb8550ff79f17beac1ebd7480fb |
|
BLAKE2b-256 | ddffeebea11209a52d4aa3147f785089d3fec6150e22267e4a188e6397283756 |
Close
Hashes for palletjack-0.2.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b07e8f894122540bd2fa7093993047caf4c047ba29bad58f29e6bc0684087dba |
|
MD5 | bee930bed7ed3023a056e5eb8a5a000d |
|
BLAKE2b-256 | 6db83e53a76c5a3ed9ae3d84bf5b3e7f5c0ae2972427519bc5edfbf8225f86e2 |
Close
Hashes for palletjack-0.2.1-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f934b4604b637d2f1a0c3b2419e6970715daf3853f1a50ba3cb5614bf468db6 |
|
MD5 | ec778ef5ba251679d89358ad056bb5cc |
|
BLAKE2b-256 | 9c912532d3c2bfdcc56f7968bf776feec3a3cf1c7f0f6dae54c65659595328fa |