Skip to main content

Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.

Project description

h5dataframe

Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.

Warning !

This is very much a work in progress, some features might not work yet or cause bugs. Save your data elsewhere before converting it to an H5DataFrame.

If you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.

Overview

This library provides the H5DataFrame object, replacing the regular pandas.DataFrame.

An H5DataFrame can be created from a pandas.DataFrame or from a dictionnary of (column_name -> column_values).

>>> import pandas as pd
>>> from h5dataframe import H5DataFrame
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, 
                      index=['r1', 'r2', 'r3'])
>>> hdf = H5DataFrame(df)
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[RAM]
[3 rows x 2 columns]

At this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the H5DataFrame.write() method.

>>> hdf.write('path/to/file.h5')
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]

The H5DataFrame is now backed on an hdf5 file, only loading data in RAM when requested.

Alternatively, an H5DataFrame can be read directly from an previously created hdf5 file with the H5DataFrame.read() method.

>>> from h5dataframe import H5Mode
>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]

The default mode is READ ('r') which creates a read-only H5DataFrame. To modify the data, use mode=H5Mode.READ_WRITE ('r+').

Installation

From pip:

pip install h5dataframe

From source:

git clone git@github.com:Vidium/h5dataframe.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5dataframe-0.2.2.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

h5dataframe-0.2.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file h5dataframe-0.2.2.tar.gz.

File metadata

  • Download URL: h5dataframe-0.2.2.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic

File hashes

Hashes for h5dataframe-0.2.2.tar.gz
Algorithm Hash digest
SHA256 ab9b5e0ec04a4807f452bf16fce8c73cf76812a2b69ca237558a071fab3802d2
MD5 78ccb949c3915224292a23ec5fc056e6
BLAKE2b-256 434b703ee3c845747e931b4e715dc45cb781424f8efa8b6b9babe458d2dec3ab

See more details on using hashes here.

File details

Details for the file h5dataframe-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: h5dataframe-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic

File hashes

Hashes for h5dataframe-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29b97c7cf9895aaadbc00099d8ddb155d9e6b044fe3dfe35f3fc1362da5b1a47
MD5 4d526d339ae3ef1673cced35723679bc
BLAKE2b-256 ee21283dd53353efcdbc8da2748862e13cd94d9073da209ca432210e1b6f680c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page