Skip to main content

Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.

Project description

h5dataframe

Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.

Warning !

This is very much a work in progress, some features might not work yet or cause bugs. Save your data elsewhere before converting it to an H5DataFrame.

If you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.

Overview

This library provides the H5DataFrame object, replacing the regular pandas.DataFrame.

An H5DataFrame can be created from a pandas.DataFrame or from a dictionnary of (column_name -> column_values).

>>> import pandas as pd
>>> from h5dataframe import H5DataFrame
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, 
                      index=['r1', 'r2', 'r3'])
>>> hdf = H5DataFrame(df)
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[RAM]
[3 rows x 2 columns]

At this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the H5DataFrame.write() method.

>>> hdf.write('path/to/file.h5')
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]

The H5DataFrame is now backed on an hdf5 file, only loading data in RAM when requested.

Alternatively, an H5DataFrame can be read directly from an previously created hdf5 file with the H5DataFrame.read() method.

>>> from h5dataframe import H5Mode
>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]

The default mode is READ ('r') which creates a read-only H5DataFrame. To modify the data, use mode=H5Mode.READ_WRITE ('r+').

Installation

From pip:

pip install h5dataframe

From source:

git clone git@github.com:Vidium/h5dataframe.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5dataframe-0.2.3.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

h5dataframe-0.2.3-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file h5dataframe-0.2.3.tar.gz.

File metadata

  • Download URL: h5dataframe-0.2.3.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for h5dataframe-0.2.3.tar.gz
Algorithm Hash digest
SHA256 54db846e3b897f01c34850c72bf96ad97a40d7b2760e1c41337bab956c516456
MD5 88781210a28e918580cf26f67448fb58
BLAKE2b-256 51b74c0d744fb34104565e6b1251555bebaa384a5560e3a9cf49e54ff2d03331

See more details on using hashes here.

File details

Details for the file h5dataframe-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for h5dataframe-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 189ba906fc8ef6fc46f65a7af26cc8d55fb4a6847201161c63c66ffaac72f352
MD5 f2eeeb9f630973ada5af5637453b45f3
BLAKE2b-256 fce9f54181b30f06432df1ee9e1bb18c43128e35c3d572167462a7c68fbc941d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page