Skip to main content

bostorchconnector, a Python package with a precompiled shared library

Project description

bostorchconnector

专为PyTorch训练存储在Bos上的数据集而设计的高吞吐插件,使用bostorchconnector可以高效地访问云上数据集和读写checkpoint。

bostorchconnector是实现PyTorch的dataset primitives 接口。 同时支持两种dataset:

支持checkpoint接口,可以直读/写云上Bos,无需落盘。

开始

前置环境

  • Linux
  • Python 3.8 or greater is installed
  • PyTorch >= 2.0

安装

pip install bostorchconnector

配置

配置访问凭证,以下方式配置一种即可,优先级有先后。

  • 特定配置文件~/.baidubce/credentials
  • 安装且配置过bcecmd,默认配置路径是~/.go-bcecli/credentials
  • 设置环境变量:BCE_ACCESS_KEY_IDBCE_SECRET_ACCESS_KEY

其中credentials文件的格式是

[Defaults]
Ak= 
Sk= 
Sts=

Examples

API docs

示例

使用from_prefix方法构建BosIterableDataset:

from bostorchconnector import BosIterableDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="bos://<BUCKET>/<PREFIX>"
ENDPOINT="http://bj.bcebos.com"

iterable_dataset = BosIterableDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT)

# Datasets are also iterators. 
for item in iterable_dataset:
    data = item.read()
    print(len(data))
    print(item.key)

使用from_prefix方法构建BosMapDataset:

from bostorchconnector import BosMapDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="bos://<BUCKET>/<PREFIX>"
ENDPOINT="http://bj.bcebos.com"

map_dataset = BosMapDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT)

# Randomly access to an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

直接读写model checkpoint:

from bostorchconnector import BosCheckpoint

import torchvision
import torch

CHECKPOINT_URI="bos://<BUCKET>/<KEY>/"
ENDPOINT="http://bj.bcebos.com"
checkpoint = BosCheckpoint(endpoint=ENDPOINT)

model = torchvision.models.resnet18()

# Save checkpoint to Bos
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

# Load checkpoint from Bos
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)

分布式Checkpoint (Distributed Checkpoints)

概述

bostorchconnector 提供了对 PyTorch 分布式 Checkpoint 的支持,包括:

  • BosStorageWriter:实现了 PyTorch 的 StorageWriter 接口。
  • BosStorageReader:实现了 PyTorch 的 StorageReader 接口。
  • BosFileSystem:实现了 PyTorch 的 FileSystemBase 接口。

这些工具实现了 Bos 与 PyTorch 分布式 Checkpoint 的无缝集成,支持高效存储和读取分布式模型 Checkpoint。

前置条件与安装

需要 PyTorch 2.3 或更新版本。安装时需要指定 dcp 额外依赖:

pip install bostorchconnector[dcp]

示例

from bostorchconnector.dcp import BosStorageWriter, BosStorageReader

import torchvision
import torch.distributed.checkpoint as DCP

# 配置
CHECKPOINT_URI = "bos://<BUCKET>/<KEY>/"
ENDPOINT = "http://bj.bcebos.com"

model = torchvision.models.resnet18()

# 保存分布式 Checkpoint 到 Bos
bos_storage_writer = BosStorageWriter(
    endpoint=ENDPOINT,
    path=CHECKPOINT_URI,
    thread_count=4,  # 可选;写入时使用的 IO 线程数
)
DCP.save(
    state_dict=model.state_dict(),
    storage_writer=bos_storage_writer,
)

# 从 Bos 加载分布式 Checkpoint
model = torchvision.models.resnet18()
model_state_dict = model.state_dict()
bos_storage_reader = BosStorageReader(
    endpoint=ENDPOINT,
    path=CHECKPOINT_URI,
)
DCP.load(
    state_dict=model_state_dict,
    storage_reader=bos_storage_reader,
)
model.load_state_dict(model_state_dict)

Lightning 集成

bostorchconnector 包含了对 PyTorch Lightning 的集成,提供了 BosLightningCheckpoint,它实现了 Lightning 的 CheckpointIO 接口。用户可以借此在 PyTorch Lightning 中使用 Bos 进行 Checkpoint 的读写。

安装

pip install bostorchconnector[lightning]

示例

from lightning import Trainer
from bostorchconnector.lightning import BosLightningCheckpoint

# ...

bos_checkpoint_io = BosLightningCheckpoint(endpoint="http://bj.bcebos.com")
trainer = Trainer(
    plugins=[bos_checkpoint_io],
    default_root_dir="bos://<BUCKET>/<KEY_PREFIX>/"
)
trainer.fit(model)

直接使用 BosClient

还可以直接使用 BosClient 进行自定义的流式读写。

from bostorchconnector._bos_client import BosClient

ENDPOINT = "http://bj.bcebos.com"
BUCKET_NAME = "<BUCKET>"
OBJECT_KEY = "<KEY>"

bos_client = BosClient(endpoint=ENDPOINT)

# 写入数据到 Bos
data = b"content" * 1048576
bos_writer = bos_client.put_object(bucket=BUCKET_NAME, key=OBJECT_KEY)
bos_writer.write(data)
bos_writer.close()

# 从 Bos 读取数据
bos_reader = bos_client.get_object(bucket=BUCKET_NAME, key=OBJECT_KEY)
data = bos_reader.read()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bostorchconnector-1.5.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

bostorchconnector-1.5.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

bostorchconnector-1.5.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

bostorchconnector-1.5.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

bostorchconnector-1.5.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

bostorchconnector-1.5.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

bostorchconnector-1.5.0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file bostorchconnector-1.5.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.5.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 5f04466380de8de2b21d6376b26d8df61c020909744cc48f159f8cbf99ac648d
MD5 d8d048fe7cbdfa7d9ee4edc26973442f
BLAKE2b-256 44d886f3d21feb068303698b7b90939b23f11c7944407594a823aa2476c12cfc

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.5.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.5.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c1029c9b545d2af95e03ba863f1fcf0acbe3aab171e5a0d9a0ddbb8b5811b3fe
MD5 4d5c47f1e8278eb83b77e69dedfa530a
BLAKE2b-256 b48c46978ec309f1259673367a4c3ea6db24ac21c0d6bdeebcb5690ba73f2ff1

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.5.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.5.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 16d989725cacb5dc7fc04822244ae0cda203abaf8aa1daecadd3a981d8c2c4d8
MD5 808f3707a7c3981c8ce4afd89e5cefb0
BLAKE2b-256 171973676f37f2821072c5764df4f0e79747a5868fd52bf7e44371276a1ebaa4

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.5.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.5.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 7357e319fbf339c55c20f1b813cb4ce51de28de640f795fd29cf28fe3bb2e390
MD5 20d08850648a36e6e50bfd84830089aa
BLAKE2b-256 b65b43c4fad0310808b8f1fcefedd4f31baf255a4dd7315c61aaea0184f126ad

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.5.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.5.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 0436b02fca2927077db67f6353ae51eefb1626b890669a2b2429fd65fc7d6190
MD5 794fb07a9acdd4ed47157ce30e25c41f
BLAKE2b-256 da8e4dfb439c9627a99e051973ec8a0b083c03aa931621e0b7116016d81747ec

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.5.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.5.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 bb50fd89b7435a10ae7848b51181386cb8fdf2b3ba1a8d1b0f40812d8b6be2a8
MD5 289060e0fd537ff1d1ca153650da8029
BLAKE2b-256 c9eb2690351c722719edeb505f10044d3bf35056e273889d1e40ad091879fb29

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.5.0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.5.0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 8120445513580b1daaea0007cca718bf585f7c27a5ade858cb350f8bc41b569f
MD5 8823cb315c69bf67ce9f2695d4b51c0f
BLAKE2b-256 4e68772f8673427c174299a230bdc5dff57e5aa1df39fd631559ebe05482a473

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page