Fanstore gathers local storage space in computer clusters to enable distirbuted neural networks training with larger datasets
Project description
Overview
Fanstore is a shared object store to support parallel neural network training. Fanstore provides a POSIX-compatible file system interface through fusepy, and low latency communication through mpi4py. Fanstore can use main memory, RAM disk, and local storage for transient parallel I/O at run time.
To start
sbatch bin/fanstore.slurm
To manually start fanstore
The complete ImageNet dataset
module load python3
mpiexec.hydra -f ../test/hostfile -ppn 1 python3 fanstore.py /tmp/amfora /tmp/data --loadscatter /work/00946/zzhang/imagenet/16-parts --loadbcast /work/00946/zzhang/imagenet/16-parts-validation &
A quarter of the ImageNet dataset
mpiexec.hydra -f ../test/hostfile -ppn 1 python3 fanstore.py /tmp/amfora /tmp/data --loadscatter /work/00946/zzhang/imagen
et/16-parts-test --loadbcast /work/00946/zzhang/imagenet/16-parts-validation &
To run a horovod application
module load cuda/9.0 cudnn/7.0
mpiexec.hydra -f /work/00946/zzhang/maverick2/fanstore/test/hostfile -ppn 4 python3 keras_imagenet_resnet50_fanstore.py
Before terminating the job
for h in `cat ../test/hostfile`; do ssh $h "rm -rf /tmp/data; mkdir /tmp/data; mkdir -p /tmp/amfora; rm /tmp/fuse-fanstore.log; fusermount -u /tmp/amfora"; done
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fanstore-0.0.1a0.tar.gz
(9.4 kB
view hashes)
Built Distribution
fanstore-0.0.1a0-py3-none-any.whl
(10.9 kB
view hashes)
Close
Hashes for fanstore-0.0.1a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e6edf40249ab0b2cc7fc42ca068c79a272a74f449e21edc9832efac692a651f |
|
MD5 | 823f834420685615a9b923ea1f493418 |
|
BLAKE2b-256 | ff8c3a56236b3055dbc5fb4bbf618afd27358b9ab2cd881246a9358ad4dc767b |