matorage is Matrix or Tensor(multidimensional matrix) Object Storage with high availability distributed systems for Deep Learning framework.

These details have not been verified by PyPI

Project links

Project description

matorage

An efficient way to store/load and manage dataset, model and optimizer for deep learning with matorage!

Matorage is tensor(multidimensional matrix) object storage manager for deep learning framework(Pytorch, Tensorflow V2, Keras).

Features

Boilerplated data pipeline for dataset, model and optimizer.
High performance on tensor storage

For researchers who need to focus on model training:

Support storing data in pre-processed Tensor(multidimensional matrix), eliminate training time.
Reduce storage space through multiple compression methods.
Manage data and models while training

For AI Developer who need to focus on creating data pipeline:

Concurrency data save & load
Compatible with object storage such as MinIO, S3
Generate pipeline from user endpoints data.

Quick Start with Pytorch Example

For an example of tensorflow, refer to the detailed document. If you want to see the full code, see below

0. Install matorage with pip

$ pip install matorage

1. Set up Minio Server with docker

quick start with NAS(network access storage) using docker It can be managed through the web through the address http://127.0.0.1:9000/, and security is managed through MINIO_ACCESS_KEY and MINIO_SECRET_KEY.

$ mkdir ~/shared # create nas storage folder
$ docker run -it -p 9000:9000 \
    --restart always -e \
    "MINIO_ACCESS_KEY=minio" -e \
    "MINIO_SECRET_KEY=miniosecretkey" \
    -v ~/shared:/container/vol \
    minio/minio gateway nas /container/vol

2. Save pre-processed dataset

First, create a DataConfig by importing matorage. This is an example of pre-processing mnist and storing it in distributed storage. additional is freely in the form of a dict, and records the shape and type of tensor to be stored in attributes.

from matorage import DataConfig

traindata_config = DataConfig(
    endpoint='127.0.0.1:9000',
    access_key='minio',
    secret_key='miniosecretkey',
    dataset_name='mnist',
    additional={
        "mode": "train",
        "framework" : "pytorch",
        ...
        "blah" : "blah"
    },
    attributes=[
        ('image', 'float32', (1, 28, 28)),
        ('target', 'int64', (1))
    ]
)

Now do a simple pre-processing and save the data.

from matorage import DataSaver

traindata_saver = DataSaver(config=traindata_config)
train_loader = DataLoader(dataset, batch_size=60, num_workers=8)
for (image, target) in tqdm(train_loader):
    # image shape : torch.Size([64, 1, 28, 28])
    # target shape : torch.Size([64])
    traindata_saver({
        'image': image,
        'target': target
    })
traindata_saver.disconnect()

3. Load dataset from matorage

Now fetch data iteratively from storage with the same config as the saved dataset when training.

from matorage.torch import Dataset

train_dataset = Dataset(config=traindata_config, clear=True)
train_loader = DataLoader(
    train_dataset, batch_size=64, num_workers=8, shuffle=True
)

for batch_idx, (image, target) in enumerate(tqdm(train_loader)):
    image, target = image.to(device), target.to(device)

Only an index can be fetched through lazy load.

train_dataset = Dataset(config=traindata_config, clear=True)
print(train_dataset[0], len(train_dataset))

4. Save & Load Model when training

During training, you can save and load models of specific steps or epochs in distributed storage through inmemory. First, make the model config the same as the dataset.

from matorage import ModelConfig
from matorage.torch import ModelManager

model_config = ModelConfig(
    endpoint='127.0.0.1:9000',
    access_key='minio',
    secret_key='miniosecretkey',
    model_name='mnist_simple_training',
    additional={
        "version" : "1.0.1",
        ...
        "blah" : "blah"
    }
)

model_manager = ModelManager(config=model_config)
print(model_manager.get_metadata)
model_manager.save(model, epoch=1)
print(model_manager.get_metadata)

When an empty model is loaded with specific steps or epochs, the appropriate weight is filled into the model.

print(model.state_dict())
model_manager.load(model, epoch=1)
print(model.state_dict())
# load a layer weight.
print(model_manager.load('net1.0.weight', step=0))

5. Save & Load Optimizer when training

Save and load of optimizer is similar to managing model.

from matorage import OptimizerConfig
from matorage.torch import OptimizerManager

optimizer_config = OptimizerConfig(
    endpoint='127.0.0.1:9000',
    access_key='minio',
    secret_key='miniosecretkey',
    optimizer_name='adam',
    additional={
        "model" : "1.0.1",
        ...
        "blah" : "blah"
    }
)

optimizer_manager = OptimizerManager(config=optimizer_config)
print(optimizer_manager.get_metadata)
# The optimizer contains information about the step.
optimizer_manager.save(optimizer)
print(optimizer_manager.get_metadata)

When an empty optimizer is loaded with specific steps, the appropriate weight is filled into the optimizer.

optimizer = optim.Adam(model.parameters(), lr=0.01)
optimizer_manager.load(optimizer, step=938)

Unittest

$ git clone https://github.com/graykode/matorage && cd matorage
$ python -m tests.test_suite

Framework Requirement

torch(>=1.0.0), torchvision(>=0.2.2)
tensorflow(>=2.2), tensorflow_io(>=0.13)

Author

Tae Hwan Jung(@graykode) We are looking for a contributor.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Sep 29, 2020

0.2.1

Sep 8, 2020

0.2.0

Aug 23, 2020

0.1.0

Aug 8, 2020

0.1.0a0 pre-release

Aug 8, 2020

0.0.0

Jun 21, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matorage-0.3.0.tar.gz (42.7 kB view details)

Uploaded Sep 29, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

matorage-0.3.0-py3-none-any.whl (80.0 kB view details)

Uploaded Sep 29, 2020 Python 3

File details

Details for the file matorage-0.3.0.tar.gz.

File metadata

Download URL: matorage-0.3.0.tar.gz
Upload date: Sep 29, 2020
Size: 42.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for matorage-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`8bc1901b9a9c7feef6d1e75f322eeff18cbdde5064327b3e62201d2e4adb0674`
MD5	`67fe0b070d31afd220e940dd5310c9a1`
BLAKE2b-256	`69663c1f7cf97067029da5007e2976cac75ae021e2c0c35b2ccfaf6399370c2f`

See more details on using hashes here.

File details

Details for the file matorage-0.3.0-py3-none-any.whl.

File metadata

Download URL: matorage-0.3.0-py3-none-any.whl
Upload date: Sep 29, 2020
Size: 80.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for matorage-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`558826982711e481f93b14310ed96f8c001cfaf1142dd7de367a8b4860b06306`
MD5	`8e8fdece101622d2969f63cd41c14725`
BLAKE2b-256	`430788f300a745d1b56d069a8e363f3ff47f8002f78dcd5e96c9e83119fe7edf`

See more details on using hashes here.

matorage 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

matorage

Features

Quick Start with Pytorch Example

0. Install matorage with pip

1. Set up Minio Server with docker

2. Save pre-processed dataset

3. Load dataset from matorage

4. Save & Load Model when training

5. Save & Load Optimizer when training

Unittest

Framework Requirement

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes