Skip to main content

Distributed Coordinated Sequence Sampler

Project description

DiscoSeqSampler

CI codecov PyPI version Python version License: MIT

Distributed Coordinated Sequence Sampler - 一个高效的分布式序列采样框架。

背景

在当前的 AI 领域,无论是音频语音(Audio/Speech)还是图像视频(Image/Video)模型,都广泛使用 Transformer 架构。这类模型的计算量与序列长度高度相关,而在大规模数据集中,数据的长度分布往往非常广泛。为了实现高效的多 GPU 训练,必须对训练数据的序列长度进行精细准确的管理。

DiscoSeqSampler 正是为了解决这一关键问题而设计的分布式序列采样框架,它能够智能地协调和管理不同长度的序列数据,确保训练过程的高效性和稳定性。

特性

  • 🚀 高性能: 优化的分布式采样算法
  • 🔄 协调机制: 智能的序列协调和同步
  • 📊 可扩展: 支持大规模分布式部署
  • 🛠️ 易用性: 简洁的 API 设计
  • 🔧 可配置: 灵活的配置选项

安装

  • 项目仍在开发中,功能尚未完整验证

从 PyPI 安装

pip install discoss

从源码安装

git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler
pip install -e .

快速开始

import discoss

# TODO: 添加使用示例

开发

查看 DEVELOPMENT.md 获取详细的开发指南。

快速设置

# 克隆仓库
git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler

# 安装开发依赖
pip install -e .[dev]

# 设置 pre-commit 钩子
make setup-dev

运行测试

make test

贡献

欢迎贡献!请查看 DEVELOPMENT.md 了解如何设置开发环境。

许可证

本项目采用 MIT 许可证 - 查看 LICENSE 文件了解详情。

引用

如果您在研究中使用了 DiscoSeqSampler,请引用:

@software{discoss2024,
  title={DiscoSeqSampler: Distributed Coordinated Sequence Sampler},
  author={Feiteng Li},
  year={2025},
  url={https://github.com/lifeiteng/DiscoSeqSampler}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discoss-0.1.1.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

discoss-0.1.1-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file discoss-0.1.1.tar.gz.

File metadata

  • Download URL: discoss-0.1.1.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for discoss-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f049c242687fcfb675afbc577d52a06d33c3af76a335c541af597499caa59b98
MD5 23a52a6f1609a8a048eb6292eaf60546
BLAKE2b-256 de8eb2a8cb82fa44f5dbb15230d99337d7ce57e5b4c7304365c559652408b34b

See more details on using hashes here.

File details

Details for the file discoss-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: discoss-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for discoss-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3b8cd95c48025734ca5d4ae996946ad842b18e18eb300f0a79deed86271caa35
MD5 b77febc5acaecfdfb728302b63fecbd9
BLAKE2b-256 ee87ddb3208e6c620532282113a9d1eb1c63125dd46f4eac7c1d9e76852c4bf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page