A booster 💪 for your Parquet files
Project description
virtual
A booster 💪 for your Parquet file sizes.
🛠 Build
pip install virtual-parquet
or
pip install .
🔗 Examples
A demo can be found at examples/demo.ipynb.
🗜️ Compress
import pandas as pd
import virtual
df = pd.read_csv('file.csv')
...
virtual.to_parquet(df, 'file_virtual.parquet')
% Virtualization finished: Check out 'file.parquet'.
🥢 Read
import virtual
df = virtual.from_parquet('file_virtual.parquet')
📊 Query
import virtual
virtual.query(
'select avg(price) from read_parquet("file_virtual.parquet") where year >= 2024',
engine = 'duckdb'
)
Additional Features
🔍 Discover the Functions Found
import pandas as pd
import virtual
df = pd.read_csv('file.csv')
functions = virtual.train(df)
% Functions saved under
functions.json.
📚 Citation
Please do cite our (very) cool work if you use virtual in your work.
@inproceedings{
virtual,
title={{Lightweight Correlation-Aware Table Compression}},
author={Mihail Stoian and Alexander van Renen and Jan Kobiolka and Ping-Lin Kuo and Josif Grabocka and Andreas Kipf},
booktitle={NeurIPS 2024 Third Table Representation Learning Workshop},
year={2024},
url={https://openreview.net/forum?id=z7eIn3aShi}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
virtual_parquet-0.1.2.tar.gz
(37.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file virtual_parquet-0.1.2.tar.gz.
File metadata
- Download URL: virtual_parquet-0.1.2.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62b88136966a626b8b50f5cd7692769bbf3230ab69d9ed7a915d467af4b2bcf8
|
|
| MD5 |
0cb9c8ff58ca1985286c383d162372a9
|
|
| BLAKE2b-256 |
00a55904293acf99ff00f682fa3cab0311b4de7f29b2fb6049372880d40ccf3b
|
File details
Details for the file virtual_parquet-0.1.2-py3-none-any.whl.
File metadata
- Download URL: virtual_parquet-0.1.2-py3-none-any.whl
- Upload date:
- Size: 41.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
649d22adac8793bea75ca23bfec5b7f89a24b869fbc1449ef30b771a426ed2b8
|
|
| MD5 |
9c748de8413a6628b9e7bc3d14bbe8cd
|
|
| BLAKE2b-256 |
9a27965631322e19f5f91838207ca68408724ed66afa6898088fac55d0c4984e
|