Skip to main content

Distributed Coordinated Sequence Sampler

Project description

DiscoSeqSampler

CI codecov PyPI version Python version License: MIT

Distributed Coordinated Sequence Sampler - 一个高效的分布式序列采样框架。

背景

在当前的 AI 领域,无论是音频语音(Audio/Speech)还是图像视频(Image/Video)模型,都广泛使用 Transformer 架构。这类模型的计算量与序列长度高度相关,而在大规模数据集中,数据的长度分布往往非常广泛。为了实现高效的多 GPU 训练,必须对训练数据的序列长度进行精细准确的管理。

DiscoSeqSampler 正是为了解决这一关键问题而设计的分布式序列采样框架,它能够智能地协调和管理不同长度的序列数据,确保训练过程的高效性和稳定性。

特性

  • 🚀 高性能: 优化的分布式采样算法
  • 🔄 协调机制: 智能的序列协调和同步
  • 📊 可扩展: 支持大规模分布式部署
  • 🛠️ 易用性: 简洁的 API 设计
  • 🔧 可配置: 灵活的配置选项

安装

从 PyPI 安装

pip install discoss

从源码安装

git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler
pip install -e .

快速开始

import discoss

# TODO: 添加使用示例

开发

查看 DEVELOPMENT.md 获取详细的开发指南。

快速设置

# 克隆仓库
git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler

# 安装开发依赖
pip install -e .[dev]

# 设置 pre-commit 钩子
make setup-dev

运行测试

make test

贡献

欢迎贡献!请查看 DEVELOPMENT.md 了解如何设置开发环境。

许可证

本项目采用 MIT 许可证 - 查看 LICENSE 文件了解详情。

引用

如果您在研究中使用了 DiscoSeqSampler,请引用:

@software{discoss2024,
  title={DiscoSeqSampler: Distributed Coordinated Sequence Sampler},
  author={Li, Feiteng},
  year={2025},
  url={https://github.com/lifeiteng/DiscoSeqSampler}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discoss-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

discoss-0.1.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file discoss-0.1.0.tar.gz.

File metadata

  • Download URL: discoss-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for discoss-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d6017b0da84e2a6e24fcc459bb5b1065937cf066ecc87cb66e76a67bc49cc6f
MD5 6ef778776709c6d29357477a6dd84864
BLAKE2b-256 156897d259566eb2acc69e221ef26ff9e70b8426c1456abb6bc85a21d51f5465

See more details on using hashes here.

File details

Details for the file discoss-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: discoss-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for discoss-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 051772a720c3087c14d7dcdffbe31c4e866862b668ef64ec2948da4daad4b27b
MD5 896ea938ffb50ff5645b5d2acfe75930
BLAKE2b-256 7bbf92a0816805697c153b21ae4c36601d98c40442b7c85782bbaf82731d2244

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page