Skip to main content

bostorchconnector, a Python package with a precompiled shared library

Project description

bostorchconnector

专为PyTorch训练存储在Bos上的数据集而设计的高吞吐插件,使用bostorchconnector可以高效地访问云上数据集和读写checkpoint。

bostorchconnector是实现PyTorch的dataset primitives 接口。 同时支持两种dataset:

支持checkpoint接口,可以直读/写云上Bos,无需落盘。

开始

前置环境

  • Linux
  • Python 3.8 or greater is installed
  • PyTorch >= 2.0

安装

pip install bostorchconnector

配置

配置访问凭证,以下方式配置一种即可,优先级有先后。

  • 特定配置文件~/.baidubce/credentials
  • 安装且配置过bcecmd,默认配置路径是~/.go-bcecli/credentials
  • 设置环境变量:BCE_ACCESS_KEY_IDBCE_SECRET_ACCESS_KEY

其中credentials文件的格式是

[Defaults]
Ak= 
Sk= 
Sts=

Examples

API docs

示例

使用from_prefix方法构建BosIterableDataset:

from bostorchconnector import BosIterableDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="bos://<BUCKET>/<PREFIX>"
ENDPOINT="http://bj.bcebos.com"

iterable_dataset = BosIterableDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT)

# Datasets are also iterators. 
for item in iterable_dataset:
    data = item.read()
    print(len(data))
    print(item.key)

使用from_prefix方法构建BosMapDataset:

from bostorchconnector import BosMapDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="bos://<BUCKET>/<PREFIX>"
ENDPOINT="http://bj.bcebos.com"

map_dataset = BosMapDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT)

# Randomly access to an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

直接读写model checkpoint:

from bostorchconnector import BosCheckpoint

import torchvision
import torch

CHECKPOINT_URI="bos://<BUCKET>/<KEY>/"
ENDPOINT="http://bj.bcebos.com"
checkpoint = BosCheckpoint(endpoint=ENDPOINT)

model = torchvision.models.resnet18()

# Save checkpoint to Bos
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

# Load checkpoint from Bos
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bostorchconnector-1.0.0-py3-none-any.whl (571.3 kB view details)

Uploaded Python 3

File details

Details for the file bostorchconnector-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 da957aa686e0604a53a713b47c5ecffa31777774372190eee8279cb1a96b657e
MD5 f225d08d04932af715d80cc3fa7d4901
BLAKE2b-256 88ad6230b859e7e754da9bacecddeb937a7685f86cd071d1e9eb1d74e667598b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page