Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.
Project description
h5dataframe
Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.
Warning !
This is very much a work in progress, some features might not work yet or cause bugs. Save your data elsewhere before converting it to an H5DataFrame.
If you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.
Overview
This library provides the H5DataFrame
object, replacing the regular pandas.DataFrame
.
An H5DataFrame
can be created from a pandas.DataFrame
or from a dictionnary of (column_name -> column_values).
>>> import pandas as pd
>>> from h5dataframe import H5DataFrame
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]},
index=['r1', 'r2', 'r3'])
>>> hdf = H5DataFrame(df)
>>> hdf
a b
r1 1 4
r2 2 5
r3 3 6
[RAM]
[3 rows x 2 columns]
At this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the H5DataFrame.write()
method.
>>> hdf.write('path/to/file.h5')
>>> hdf
a b
r1 1 4
r2 2 5
r3 3 6
[FILE]
[3 rows x 2 columns]
The H5DataFrame
is now backed on an hdf5 file, only loading data in RAM when requested.
Alternatively, an H5DataFrame
can be read directly from an previously created hdf5 file with the H5DataFrame.read()
method.
>>> from h5dataframe import H5Mode
>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)
a b
r1 1 4
r2 2 5
r3 3 6
[FILE]
[3 rows x 2 columns]
The default mode is READ
('r'
) which creates a read-only H5DataFrame
. To modify the data, use mode=H5Mode.READ_WRITE
('r+'
).
Installation
From pip:
pip install h5dataframe
From source:
git clone git@github.com:Vidium/h5dataframe.git
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file h5dataframe-0.2.2.tar.gz
.
File metadata
- Download URL: h5dataframe-0.2.2.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab9b5e0ec04a4807f452bf16fce8c73cf76812a2b69ca237558a071fab3802d2 |
|
MD5 | 78ccb949c3915224292a23ec5fc056e6 |
|
BLAKE2b-256 | 434b703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab |
File details
Details for the file h5dataframe-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: h5dataframe-0.2.2-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29b97c7cf9895aaadbc00099d8ddb155d9e6b044fe3dfe35f3fc1362da5b1a47 |
|
MD5 | 4d526d339ae3ef1673cced35723679bc |
|
BLAKE2b-256 | ee21283dd53353efcdbc8da2748862e13cd94d9073da209ca432210e1b6f680c |