Skip to main content

Level up your Parquet file sizes!

Project description

virtual

A booster 💪 for your Parquet file sizes.

🛠 Build

pip3 install .

🔗 Examples

A demo can be found at examples/demo.ipynb.

🗜️ Compress

import pandas as pd
import virtual

df = pd.read_csv('file.csv')

...

virtual.to_parquet(df, 'file.parquet')

% Virtualization finished: Check out 'file.parquet'.

🥢 Read

import virtual

df = virtual.from_parquet('file.parquet')

📊 Query

import virtual

virtual.query(
  'select avg(price) from read_parquet("file.parquet") where year >= 2024',
  engine = 'duckdb'
)

Additional Features

🔍 Discover the Functions Found

import pandas as pd
import virtual

df = pd.read_csv('file.csv')

functions = virtual.train(df)

% Functions saved under functions.json.

📚 Citation

Please do cite our (very) cool work if you use virtual in your work.

@inproceedings{
  virtual,
  title={Lightweight Correlation-Aware Table Compression},
  author={Mihail Stoian and Alexander van Renen and Jan Kobiolka and Ping-Lin Kuo and Josif Grabocka and Andreas Kipf},
  booktitle={NeurIPS 2024 Third Table Representation Learning Workshop},
  year={2024},
  url={https://openreview.net/forum?id=z7eIn3aShi}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtual_parquet-0.1.0.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtual_parquet-0.1.0-py3-none-any.whl (41.1 kB view details)

Uploaded Python 3

File details

Details for the file virtual_parquet-0.1.0.tar.gz.

File metadata

  • Download URL: virtual_parquet-0.1.0.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for virtual_parquet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c041a7b8fa2ee78a03cd0a301234c2fe7d03ab03c7ba27e2801b757dafa06caa
MD5 5daccd8e8c60c0dc78ebe5069fa1de89
BLAKE2b-256 a0d86849a54139141197a626f0f84a4e36ef161316f544bce97288fdfab2ae39

See more details on using hashes here.

File details

Details for the file virtual_parquet-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for virtual_parquet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dbcc4a80835e3407b1ddcbffb7fdb8699424f3b5e8d720bc798c4f5475a0b760
MD5 11878b947269e7d91c6a75fb0d7c4cb3
BLAKE2b-256 d74e4082d6aab0a0f0759355ae948a16c4a3d95cc80b320e5e447e04cdddd77d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page