A booster 💪 for your Parquet files
Project description
virtual
A booster 💪 for your Parquet file sizes.
🛠 Build
pip3 install virtual-parquet
or
pip3 install .
🔗 Examples
A demo can be found at examples/demo.ipynb.
🗜️ Compress
import pandas as pd
import virtual
df = pd.read_csv('file.csv')
...
virtual.to_parquet(df, 'file_virtual.parquet')
% Virtualization finished: Check out 'file.parquet'.
🥢 Read
import virtual
df = virtual.from_parquet('file_virtual.parquet')
📊 Query
import virtual
virtual.query(
'select avg(price) from read_parquet("file_virtual.parquet") where year >= 2024',
engine = 'duckdb'
)
Additional Features
🔍 Discover the Functions Found
import pandas as pd
import virtual
df = pd.read_csv('file.csv')
functions = virtual.train(df)
% Functions saved under
functions.json.
📚 Citation
Please do cite our (very) cool work if you use virtual in your work.
@inproceedings{
virtual,
title={{Lightweight Correlation-Aware Table Compression}},
author={Mihail Stoian and Alexander van Renen and Jan Kobiolka and Ping-Lin Kuo and Josif Grabocka and Andreas Kipf},
booktitle={NeurIPS 2024 Third Table Representation Learning Workshop},
year={2024},
url={https://openreview.net/forum?id=z7eIn3aShi}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
virtual_parquet-0.1.1.tar.gz
(36.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file virtual_parquet-0.1.1.tar.gz.
File metadata
- Download URL: virtual_parquet-0.1.1.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7d217b8617567cf1d46ea5705f4fb700b0f9babdeca26b52e21c2fa1695332b
|
|
| MD5 |
cb7621646126860084cb59c9e1dcbe89
|
|
| BLAKE2b-256 |
70d45b84fe732b24fc79c06bc79618abe9d86f91b967fd9e22a9646ac6240011
|
File details
Details for the file virtual_parquet-0.1.1-py3-none-any.whl.
File metadata
- Download URL: virtual_parquet-0.1.1-py3-none-any.whl
- Upload date:
- Size: 41.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
977b9187c465cfc8804e4137f1588cf20ef53c93fe5c3b8a2e38541e47c71602
|
|
| MD5 |
54c317a9ba51788035b0913bbacf93f1
|
|
| BLAKE2b-256 |
739d626d6557368288053d41f8810baeb7d59f9a71a196e4b23b740109b5be33
|