Skip to main content

bostorchconnector, a Python package with a precompiled shared library

Project description

bostorchconnector

专为PyTorch训练存储在Bos上的数据集而设计的高吞吐插件,使用bostorchconnector可以高效地访问云上数据集和读写checkpoint。

bostorchconnector是实现PyTorch的dataset primitives 接口。 同时支持两种dataset:

支持checkpoint接口,可以直读/写云上Bos,无需落盘。

开始

前置环境

  • Linux
  • Python 3.8 or greater is installed
  • PyTorch >= 2.0

安装

pip install bostorchconnector

配置

配置访问凭证,以下方式配置一种即可,优先级有先后。

  • 特定配置文件~/.baidubce/credentials
  • 安装且配置过bcecmd,默认配置路径是~/.go-bcecli/credentials
  • 设置环境变量:BCE_ACCESS_KEY_IDBCE_SECRET_ACCESS_KEY

其中credentials文件的格式是

[Defaults]
Ak= 
Sk= 
Sts=

Examples

API docs

示例

使用from_prefix方法构建BosIterableDataset:

from bostorchconnector import BosIterableDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="bos://<BUCKET>/<PREFIX>"
ENDPOINT="http://bj.bcebos.com"

iterable_dataset = BosIterableDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT)

# Datasets are also iterators. 
for item in iterable_dataset:
    data = item.read()
    print(len(data))
    print(item.key)

使用from_prefix方法构建BosMapDataset:

from bostorchconnector import BosMapDataset

# You need to update <BUCKET> and <PREFIX>
DATASET_URI="bos://<BUCKET>/<PREFIX>"
ENDPOINT="http://bj.bcebos.com"

map_dataset = BosMapDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT)

# Randomly access to an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

直接读写model checkpoint:

from bostorchconnector import BosCheckpoint

import torchvision
import torch

CHECKPOINT_URI="bos://<BUCKET>/<KEY>/"
ENDPOINT="http://bj.bcebos.com"
checkpoint = BosCheckpoint(endpoint=ENDPOINT)

model = torchvision.models.resnet18()

# Save checkpoint to Bos
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

# Load checkpoint from Bos
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)

分布式Checkpoint (Distributed Checkpoints)

概述

bostorchconnector 提供了对 PyTorch 分布式 Checkpoint 的支持,包括:

  • BosStorageWriter:实现了 PyTorch 的 StorageWriter 接口。
  • BosStorageReader:实现了 PyTorch 的 StorageReader 接口。
  • BosFileSystem:实现了 PyTorch 的 FileSystemBase 接口。

这些工具实现了 Bos 与 PyTorch 分布式 Checkpoint 的无缝集成,支持高效存储和读取分布式模型 Checkpoint。

前置条件与安装

需要 PyTorch 2.3 或更新版本。安装时需要指定 dcp 额外依赖:

pip install bostorchconnector[dcp]

示例

from bostorchconnector.dcp import BosStorageWriter, BosStorageReader

import torchvision
import torch.distributed.checkpoint as DCP

# 配置
CHECKPOINT_URI = "bos://<BUCKET>/<KEY>/"
ENDPOINT = "http://bj.bcebos.com"

model = torchvision.models.resnet18()

# 保存分布式 Checkpoint 到 Bos
bos_storage_writer = BosStorageWriter(
    endpoint=ENDPOINT,
    path=CHECKPOINT_URI,
    thread_count=4,  # 可选;写入时使用的 IO 线程数
)
DCP.save(
    state_dict=model.state_dict(),
    storage_writer=bos_storage_writer,
)

# 从 Bos 加载分布式 Checkpoint
model = torchvision.models.resnet18()
model_state_dict = model.state_dict()
bos_storage_reader = BosStorageReader(
    endpoint=ENDPOINT,
    path=CHECKPOINT_URI,
)
DCP.load(
    state_dict=model_state_dict,
    storage_reader=bos_storage_reader,
)
model.load_state_dict(model_state_dict)

Lightning 集成

bostorchconnector 包含了对 PyTorch Lightning 的集成,提供了 BosLightningCheckpoint,它实现了 Lightning 的 CheckpointIO 接口。用户可以借此在 PyTorch Lightning 中使用 Bos 进行 Checkpoint 的读写。

安装

pip install bostorchconnector[lightning]

示例

from lightning import Trainer
from bostorchconnector.lightning import BosLightningCheckpoint

# ...

bos_checkpoint_io = BosLightningCheckpoint(endpoint="http://bj.bcebos.com")
trainer = Trainer(
    plugins=[bos_checkpoint_io],
    default_root_dir="bos://<BUCKET>/<KEY_PREFIX>/"
)
trainer.fit(model)

直接使用 BosClient

还可以直接使用 BosClient 进行自定义的流式读写。

from bostorchconnector._bos_client import BosClient

ENDPOINT = "http://bj.bcebos.com"
BUCKET_NAME = "<BUCKET>"
OBJECT_KEY = "<KEY>"

bos_client = BosClient(endpoint=ENDPOINT)

# 写入数据到 Bos
data = b"content" * 1048576
bos_writer = bos_client.put_object(bucket=BUCKET_NAME, key=OBJECT_KEY)
bos_writer.write(data)
bos_writer.close()

# 从 Bos 读取数据
bos_reader = bos_client.get_object(bucket=BUCKET_NAME, key=OBJECT_KEY)
data = bos_reader.read()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bostorchconnector-1.4.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

bostorchconnector-1.4.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

bostorchconnector-1.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

bostorchconnector-1.4.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

bostorchconnector-1.4.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

bostorchconnector-1.4.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

bostorchconnector-1.4.0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file bostorchconnector-1.4.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.4.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 368d91890bf4ba5fdbb91eb9144c4fe9b372a1e99f32d4bd354271befc0f24d1
MD5 33edebc4b4e68a85047a251492cb0df9
BLAKE2b-256 c62629968e097c0e66025284491fb9ac6e4e8f48101b2256e5b0d2b3994a93b5

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.4.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.4.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 625edab7cf635267bb0ebe9d9c90fd0b178da6ab544f719af5f91c99bd28b139
MD5 471122515a68d3630ff5632edd6d7f74
BLAKE2b-256 66520058bc160c49ac36fd53c35e2cea82bb989100531c44e2f5eb707f7622e0

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 caccdeb87ed37447b0a7415b2bafb196cbf3d5a4452df03d4170bdd20cd8c364
MD5 665cde06e2e0fafccb86be54e3d07d01
BLAKE2b-256 415ab2f79678fcc8d8374b2d807361b463bd819d5d208b8b114acb1e49f1a171

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.4.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.4.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 4ab84b84d635ced17214e53eaafcc30878edd3e5f9bc99d75ccd308b0527d14e
MD5 82b92cc2ce647d518e7e51b9115ec27f
BLAKE2b-256 56e6c05816efa067851e3e5752094798c1840e974e4c42717cb81897df783939

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.4.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.4.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 e1dd16378dd962eee2bfc7e204997bee5cad6c3b804bc3892ba43f2d35bcecfc
MD5 3d48b096586fc35cac8ba5263b1dd891
BLAKE2b-256 c16949cd42fae87ca25507f07c4d9b4ce81869a877caea5a66ebdcc30baa452b

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.4.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.4.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 04dfaba881271b3e216034d3b39eea0ebf94c74566c05d60c679af7914959420
MD5 d9264c3b1c26b855302ab65b73cd91ca
BLAKE2b-256 c7d45a76d27f09d5c10437ebb4f818cd8503120fa98d393e9ec15a2b8957fbec

See more details on using hashes here.

File details

Details for the file bostorchconnector-1.4.0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for bostorchconnector-1.4.0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 307206a01e2b063c263fbc71d9afb84738a8f2597c5539e347c1b29784a8c516
MD5 7f0233cd60c8a981ca18847dd4cd1aa7
BLAKE2b-256 1a48159f351df1debed10c7d79833929583d2e5f2be3ffdc7443be28aeccf71d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page