Level up your Parquet file sizes!
Project description
virtual
A booster 💪 for your Parquet file sizes.
🛠 Build
pip3 install .
🔗 Examples
A demo can be found at examples/demo.ipynb.
🗜️ Compress
import pandas as pd
import virtual
df = pd.read_csv('file.csv')
...
virtual.to_parquet(df, 'file.parquet')
% Virtualization finished: Check out 'file.parquet'.
🥢 Read
import virtual
df = virtual.from_parquet('file.parquet')
📊 Query
import virtual
virtual.query(
'select avg(price) from read_parquet("file.parquet") where year >= 2024',
engine = 'duckdb'
)
Additional Features
🔍 Discover the Functions Found
import pandas as pd
import virtual
df = pd.read_csv('file.csv')
functions = virtual.train(df)
% Functions saved under
functions.json.
📚 Citation
Please do cite our (very) cool work if you use virtual in your work.
@inproceedings{
virtual,
title={Lightweight Correlation-Aware Table Compression},
author={Mihail Stoian and Alexander van Renen and Jan Kobiolka and Ping-Lin Kuo and Josif Grabocka and Andreas Kipf},
booktitle={NeurIPS 2024 Third Table Representation Learning Workshop},
year={2024},
url={https://openreview.net/forum?id=z7eIn3aShi}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
virtual_parquet-0.1.0.tar.gz
(36.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file virtual_parquet-0.1.0.tar.gz.
File metadata
- Download URL: virtual_parquet-0.1.0.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c041a7b8fa2ee78a03cd0a301234c2fe7d03ab03c7ba27e2801b757dafa06caa
|
|
| MD5 |
5daccd8e8c60c0dc78ebe5069fa1de89
|
|
| BLAKE2b-256 |
a0d86849a54139141197a626f0f84a4e36ef161316f544bce97288fdfab2ae39
|
File details
Details for the file virtual_parquet-0.1.0-py3-none-any.whl.
File metadata
- Download URL: virtual_parquet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbcc4a80835e3407b1ddcbffb7fdb8699424f3b5e8d720bc798c4f5475a0b760
|
|
| MD5 |
11878b947269e7d91c6a75fb0d7c4cb3
|
|
| BLAKE2b-256 |
d74e4082d6aab0a0f0759355ae948a16c4a3d95cc80b320e5e447e04cdddd77d
|