Skip to main content

df_cereal - playing with dataframe serialization

Project description

DF_Cereal - Serialization testing ground

This is a stripped down repo to test different methods of dataframe serialization. It aims to be a referencer implementation for serializing dataframes with pyarrow.

Dataframe serialization is hard, and it is the source of performance regresssions. Arrow seems to be the way forward for dataframe libraries and for dataframe serialization. This project is meant to be a colaborative reference for library authors who want to do high performance serialization.

Planned features include

  • A repo that demonstrates different ways to serialize dataframes, with MVP implementations that are easy to adapt
  • Benchmarks for different serialization techniques
  • Tests for all of this
  • Examples of more complex dataframe constructs, and how they appear in JS. Multi-indexes, TimeStamps, structures
  • Simple documentation that is easy to follow

notes

This repo is built on top of stripped down buckaroo repo. Some buckaroo artifacts might pop out here and there.

Development installation

For a development installation:

git clone https://github.com/paddymul/df_cereal.git
cd df_cereal
#we need to build against 3.6.5, jupyterlab 4.0 has different JS typing that conflicts
# the installable still works in JL4
pip install build twine pytest sphinx polars mypy jupyterlab==3.6.5 pandas-stubs
pip install -ve .

Enabling development install for Jupyter notebook:

Enabling development install for JupyterLab:

jupyter labextension develop . --overwrite

Note for developers: the --symlink argument on Linux or OS X allows one to modify the JavaScript code in-place. This feature is not available with Windows. `

Developing the JS side

There are a series of examples of the components in examples/ex.

Instructions

npm install
npm run dev

Contributions

We :heart: contributions.

Have you had a good experience with this project? Why not share some love and contribute code, or just let us know about any issues you had with it?

We welcome issue reports here; be sure to choose the proper issue template for your issue, so that we can be sure you're providing the necessary information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df_cereal-0.0.1.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

df_cereal-0.0.1-py3-none-any.whl (587.5 kB view details)

Uploaded Python 3

File details

Details for the file df_cereal-0.0.1.tar.gz.

File metadata

  • Download URL: df_cereal-0.0.1.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for df_cereal-0.0.1.tar.gz
Algorithm Hash digest
SHA256 88c159c534647083498d755f225c29866053f1cc4b3ef3b93d49a9223ae2de0e
MD5 bce68957aa6c7026f2bfca6266e3bda7
BLAKE2b-256 fa60084f35f63bbf383101770147645d09a697d68d8b5545ec19a9b781855caf

See more details on using hashes here.

File details

Details for the file df_cereal-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: df_cereal-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 587.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for df_cereal-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dc5fcce0cc86f43dfa5d85a332d6fe42b08b6baac25ed6593378b939a3580051
MD5 e7501689f963cfe1931f8215ee9e24c1
BLAKE2b-256 c30adb9617192a2e0f06abd437bc9bbc584f77af37bdaeb6f59139d82eac9159

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page