Skip to main content

Fanstore gathers local storage space in computer clusters to enable distirbuted neural networks training with larger datasets

Project description

Overview

Fanstore is a shared object store to support parallel neural network training. Fanstore provides a POSIX-compatible file system interface through fusepy, and low latency communication through mpi4py. Fanstore can use main memory, RAM disk, and local storage for transient parallel I/O at run time.

To start

sbatch bin/fanstore.slurm

To manually start fanstore

The complete ImageNet dataset

module load python3
mpiexec.hydra -f ../test/hostfile -ppn 1 python3 fanstore.py /tmp/amfora /tmp/data --loadscatter /work/00946/zzhang/imagenet/16-parts --loadbcast /work/00946/zzhang/imagenet/16-parts-validation &

A quarter of the ImageNet dataset

mpiexec.hydra -f ../test/hostfile -ppn 1 python3 fanstore.py /tmp/amfora /tmp/data --loadscatter /work/00946/zzhang/imagen
et/16-parts-test --loadbcast /work/00946/zzhang/imagenet/16-parts-validation &

To run a horovod application

module load cuda/9.0 cudnn/7.0
mpiexec.hydra -f /work/00946/zzhang/maverick2/fanstore/test/hostfile -ppn 4  python3 keras_imagenet_resnet50_fanstore.py

Before terminating the job

for h in `cat ../test/hostfile`; do   ssh $h "rm -rf /tmp/data; mkdir /tmp/data; mkdir -p /tmp/amfora; rm /tmp/fuse-fanstore.log; fusermount -u /tmp/amfora"; done

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fanstore-0.0.1a0.tar.gz (9.4 kB view hashes)

Uploaded Source

Built Distribution

fanstore-0.0.1a0-py3-none-any.whl (10.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page