df_cereal - playing with dataframe serialization
Project description
DF_Cereal - Serialization testing ground
This is a stripped down repo to test different methods of dataframe serialization. It aims to be a referencer implementation for serializing dataframes with pyarrow.
Dataframe serialization is hard, and it is the source of performance regresssions. Arrow seems to be the way forward for dataframe libraries and for dataframe serialization. This project is meant to be a colaborative reference for library authors who want to do high performance serialization.
Planned features include
- A repo that demonstrates different ways to serialize dataframes, with MVP implementations that are easy to adapt
- Benchmarks for different serialization techniques
- Tests for all of this
- Examples of more complex dataframe constructs, and how they appear in JS. Multi-indexes, TimeStamps, structures
- Simple documentation that is easy to follow
notes
This repo is built on top of stripped down buckaroo repo. Some buckaroo artifacts might pop out here and there.
Development installation
For a development installation:
git clone https://github.com/paddymul/df_cereal.git
cd df_cereal
#we need to build against 3.6.5, jupyterlab 4.0 has different JS typing that conflicts
# the installable still works in JL4
pip install build twine pytest sphinx polars mypy jupyterlab==3.6.5 pandas-stubs
pip install -ve .
Enabling development install for Jupyter notebook:
Enabling development install for JupyterLab:
jupyter labextension develop . --overwrite
Note for developers: the --symlink
argument on Linux or OS X allows one to modify the JavaScript code in-place. This feature is not available with Windows.
`
Developing the JS side
There are a series of examples of the components in examples/ex.
Instructions
npm install
npm run dev
Contributions
We :heart: contributions.
Have you had a good experience with this project? Why not share some love and contribute code, or just let us know about any issues you had with it?
We welcome issue reports here; be sure to choose the proper issue template for your issue, so that we can be sure you're providing the necessary information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file df_cereal-0.0.1.tar.gz
.
File metadata
- Download URL: df_cereal-0.0.1.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88c159c534647083498d755f225c29866053f1cc4b3ef3b93d49a9223ae2de0e |
|
MD5 | bce68957aa6c7026f2bfca6266e3bda7 |
|
BLAKE2b-256 | fa60084f35f63bbf383101770147645d09a697d68d8b5545ec19a9b781855caf |
File details
Details for the file df_cereal-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: df_cereal-0.0.1-py3-none-any.whl
- Upload date:
- Size: 587.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc5fcce0cc86f43dfa5d85a332d6fe42b08b6baac25ed6593378b939a3580051 |
|
MD5 | e7501689f963cfe1931f8215ee9e24c1 |
|
BLAKE2b-256 | c30adb9617192a2e0f06abd437bc9bbc584f77af37bdaeb6f59139d82eac9159 |