Skip to main content

A booster 💪 for your Parquet files

Project description

virtual

A booster 💪 for your Parquet file sizes.

🛠 Build

pip3 install virtual-parquet

or

pip3 install .

🔗 Examples

A demo can be found at examples/demo.ipynb.

🗜️ Compress

import pandas as pd
import virtual

df = pd.read_csv('file.csv')

...

virtual.to_parquet(df, 'file_virtual.parquet')

% Virtualization finished: Check out 'file.parquet'.

🥢 Read

import virtual

df = virtual.from_parquet('file_virtual.parquet')

📊 Query

import virtual

virtual.query(
  'select avg(price) from read_parquet("file_virtual.parquet") where year >= 2024',
  engine = 'duckdb'
)

Additional Features

🔍 Discover the Functions Found

import pandas as pd
import virtual

df = pd.read_csv('file.csv')

functions = virtual.train(df)

% Functions saved under functions.json.

📚 Citation

Please do cite our (very) cool work if you use virtual in your work.

@inproceedings{
  virtual,
  title={{Lightweight Correlation-Aware Table Compression}},
  author={Mihail Stoian and Alexander van Renen and Jan Kobiolka and Ping-Lin Kuo and Josif Grabocka and Andreas Kipf},
  booktitle={NeurIPS 2024 Third Table Representation Learning Workshop},
  year={2024},
  url={https://openreview.net/forum?id=z7eIn3aShi}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtual_parquet-0.1.1.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtual_parquet-0.1.1-py3-none-any.whl (41.1 kB view details)

Uploaded Python 3

File details

Details for the file virtual_parquet-0.1.1.tar.gz.

File metadata

  • Download URL: virtual_parquet-0.1.1.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for virtual_parquet-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c7d217b8617567cf1d46ea5705f4fb700b0f9babdeca26b52e21c2fa1695332b
MD5 cb7621646126860084cb59c9e1dcbe89
BLAKE2b-256 70d45b84fe732b24fc79c06bc79618abe9d86f91b967fd9e22a9646ac6240011

See more details on using hashes here.

File details

Details for the file virtual_parquet-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for virtual_parquet-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 977b9187c465cfc8804e4137f1588cf20ef53c93fe5c3b8a2e38541e47c71602
MD5 54c317a9ba51788035b0913bbacf93f1
BLAKE2b-256 739d626d6557368288053d41f8810baeb7d59f9a71a196e4b23b740109b5be33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page