Skip to main content

python binding of BigFile, a peta scale IO format

Project description

scalable large data io for peta scale apps on bluewaters.

These Bigfiles are really big.

Developed for the BlueTides simulation on BlueWaters at NCSA.

It works: a snapshot file of BlueTides can be 45TB in size; we spend 10mins to dump a snapshot, and 5mins to read one.

Physically, a file is spread into many files, represented by a directory tree on the Lustre files system.

Logically, a file consists of many blocks. A block is a two dimension table of a given data type, of ‘nmemb’ columns and ‘size’ rows. Attributes can be attached to a block. Read/Write operation automatically cast the buffer to requrested datatype.

  • There is a python API;

  • There is a C API;

  • There is a C MPI API.

There are also tools to inspect these files:

  • bigfile-cat

  • bigfile-repartition

  • bigfile-ls

  • bigfile-get-attr

We originally plan to use HDF5, but it does not integrate with our simulation software very well. HDF5 does not provide a unified interface to access data spread into many files.

Yu Feng

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigfile-0.1pre.tar.gz (68.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page