AI时代的下载镜像加速器 - 支持Docker/PyPI/HuggingFace的多线程加速下载代理

These details have not been verified by PyPI

Project links

Project description

🚀 aimirror

AI 时代的下载镜像加速器 —— 被慢速网络逼疯的工程师的自救工具

💡 项目背景

作为一名 AI 工程师，每天的工作离不开：

pip install torch —— 几百 MB 的 wheel 包下载到地老天荒
docker pull nvidia/cuda —— 几个 GB 的镜像层反复下载
huggingface-cli download —— 模型文件从 HuggingFace 蜗牛般爬过来

公司内网有代理，但单线程下载大文件依然慢得让人崩溃。重复下载相同的包？不存在的缓存。忍无可忍，于是写了这个工具。

aimirror = 智能路由 + 并行分片下载 + 本地缓存，让下载速度飞起来。

✨ 功能特性

⚡ 并行下载 —— HTTP Range 分片，多线程并发，榨干带宽
💾 智能缓存 —— 基于文件 digest 去重，LRU 自动淘汰
🎯 动态路由 —— 小文件直接代理，大文件自动并行
🔗 多源支持 —— Docker Hub、PyPI、CRAN、HuggingFace 开箱即用
🔌 任意扩展 —— 只要是 HTTP 下载，配置一条规则即可几十倍加速

🏗️ 架构

flowchart LR
    subgraph Client["客户端"]
        PIP[pip install]
        DOCKER[docker pull]
        HF[huggingface-cli]
    end

    subgraph aimirror["aimirror 服务"]
        ROUTER[路由匹配器<br/>router.py]
        PROXY[直接代理<br/>小文件]
        DOWNLOADER[并行下载器<br/>downloader.py]
        CACHE[缓存管理器<br/>cache.py]
    end

    subgraph UpstreamProxy["上游代理 (可选)"]
        COMPANY_PROXY[公司代理<br/>http://proxy.company.com:8080]
    end

    subgraph Upstream["上游服务"]
        PYPI[PyPI]
        DOCKER_HUB[Docker Hub]
        HF_HUB[HuggingFace]
    end

    PIP --> ROUTER
    DOCKER --> ROUTER
    HF --> ROUTER

    ROUTER -->|小文件| PROXY
    ROUTER -->|大文件| DOWNLOADER

    PROXY --> UpstreamProxy
    DOWNLOADER --> CACHE
    CACHE -->|未命中| UpstreamProxy
    CACHE -->|命中| Client

    UpstreamProxy -->|可选| Upstream
    UpstreamProxy -.->|直连| Upstream

    DOCKER_HUB -->|返回文件| CACHE
    PYPI -->|返回文件| CACHE
    HF_HUB -->|返回文件| CACHE

🚀 快速开始

# 安装
pip install -r requirements.txt

# 启动
python main.py

# 使用
curl http://localhost:8081/health

🔧 客户端配置

pip

pip install torch --index-url http://localhost:8081/simple --trusted-host localhost:8081

Docker

{
  "registry-mirrors": ["http://localhost:8081"]
}

HuggingFace (huggingface-cli)

# 设置环境变量
export HF_ENDPOINT=http://localhost:8081

# 下载模型（支持所有文件类型：.gguf, .bin, .safetensors, .json 等）
huggingface-cli download TheBloke/Llama-2-7B-GGUF llama-2-7b.Q4_K_M.gguf

# 下载整个仓库
huggingface-cli download meta-llama/Llama-2-7b-hf --local-dir ./models

或使用 Python:

import os
os.environ["HF_ENDPOINT"] = "http://localhost:8081"

from huggingface_hub import hf_hub_download, snapshot_download

# 下载单个文件
hf_hub_download(repo_id="TheBloke/Llama-2-7B-GGUF", filename="llama-2-7b.Q4_K_M.gguf")

# 下载整个仓库
snapshot_download(repo_id="meta-llama/Llama-2-7b-hf", local_dir="./models")

📖 API

路径	说明
`/*`	代理到对应上游 (Docker/PyPI/CRAN/HuggingFace)
`/health`	健康检查
`/stats`	缓存统计

🐳 Docker 部署

# 使用 GitHub Container Registry
docker pull ghcr.io/livehl/aimirror:latest

# 运行
docker run -d -p 8081:8081 -v $(pwd)/cache:/data/fast_proxy/cache ghcr.io/livehl/aimirror:latest

⚙️ 配置示例

server:
  host: "0.0.0.0"
  port: 8081
  upstream_proxy: "http://proxy.company.com:8080"  # 公司代理（可选）

cache:
  dir: "./cache"
  max_size_gb: 100

rules:
  - name: docker-blob
    pattern: "/v2/.*/blobs/sha256:[a-f0-9]+"
    upstream: "https://registry-1.docker.io"
    strategy: parallel
    min_size: 1048576
    concurrency: 20
    chunk_size: 10485760

  - name: pip-wheel
    pattern: "/packages/.+\.whl$"
    upstream: "https://pypi.org"
    strategy: parallel
    min_size: 1048576
    concurrency: 20
    chunk_size: 5242880

  # HuggingFace 文件下载（支持所有模型文件，临时签名 URL 缓存优化）
  - name: huggingface-files
    pattern: '/.*/(blob|resolve)/main/.+'
    upstream: "https://huggingface.co"
    strategy: parallel
    min_size: 1048576
    concurrency: 20
    chunk_size: 10485760
    cache_key_source: original  # 使用原始 URL 作为缓存 key
    path_rewrite:
      - search: "/blob/main/"
        replace: "/resolve/main/"

  # 示例：扩展任意 HTTP 下载站点
  # - name: my-custom-repo
  #   pattern: '/downloads/.+\.(tar\.gz|zip|bin)$'
  #   upstream: "https://downloads.example.com"
  #   strategy: parallel
  #   min_size: 10485760    # 10MB 以上启用并行
  #   concurrency: 16
  #   chunk_size: 20971520  # 20MB 分片

  - name: default
    pattern: ".*"
    upstream: "https://pypi.org"
    strategy: proxy

配置说明

字段	说明
`server.upstream_proxy`	可选的公司代理，用于连接外网
`rules[].upstream`	上游源 base URL
`rules[].strategy`	`proxy` 直接代理 / `parallel` 并行下载
`rules[].path_rewrite`	路径重写规则（如 HuggingFace blob→resolve）
`rules[].cache_key_source`	`original` 使用原始URL作为缓存key（解决临时签名问题）

🧪 测试

运行测试

# 运行简单测试（无需 pytest）
python test_simple.py

# 运行完整测试套件（需要 pytest）
pytest test_proxy.py -v

手动验证

测试 PyPI 代理

curl -o /dev/null "http://localhost:8081/packages/fb/d7/71b982339efc4fff3c622c6fefecddfd3e0b35b60c5f822872d5b806bb71/torch-1.0.0-cp27-cp27m-manylinux1_x86_64.whl" \
  -w "HTTP: %{http_code}, Size: %{size_download}, Time: %{time_total}s\n"

测试 HuggingFace 代理

export HF_ENDPOINT=http://localhost:8081

# 测试下载 GGUF 模型文件
huggingface-cli download TheBloke/Llama-2-7B-GGUF llama-2-7b.Q4_K_M.gguf

# 测试下载 safetensors 格式模型
huggingface-cli download meta-llama/Llama-2-7b-hf model-00001-of-00002.safetensors

# 测试下载整个仓库
huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir ./test-model

测试 Docker Registry 代理

# 获取 token
TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/nginx:pull" \
  | grep -o '"token":"[^"]*"' | cut -d'"' -f4)

# 下载 blob
curl -o /dev/null "http://localhost:8081/v2/library/nginx/blobs/sha256:abc123" \
  -H "Authorization: Bearer $TOKEN" \
  -w "HTTP: %{http_code}, Size: %{size_download}, Time: %{time_total}s\n"

验证缓存命中

# 第一次下载（并行下载）
time curl -o /tmp/test1.gguf "http://localhost:8081/unsloth/model/resolve/main/file.gguf"

# 第二次下载（缓存命中，应该快很多）
time curl -o /tmp/test2.gguf "http://localhost:8081/unsloth/model/resolve/main/file.gguf"

# 查看缓存统计
curl http://localhost:8081/stats | jq

📄 License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.4

Mar 15, 2026

0.3.3

Mar 15, 2026

0.3.2

Mar 15, 2026

0.3.1

Mar 14, 2026

0.3.0

Mar 10, 2026

0.2.9

Mar 10, 2026

0.2.8

Mar 10, 2026

0.2.7

Mar 10, 2026

0.2.6

Mar 7, 2026

0.2.5

Mar 7, 2026

0.2.4

Mar 7, 2026

0.2.3

Mar 7, 2026

0.2.2

Mar 7, 2026

0.2.1

Mar 6, 2026

0.2.0

Mar 6, 2026

0.1.9

Mar 6, 2026

0.1.8

Mar 6, 2026

0.1.7

Mar 6, 2026

0.1.6

Mar 6, 2026

0.1.5

Mar 6, 2026

0.1.4

Mar 6, 2026

0.1.3

Mar 6, 2026

0.1.2

Mar 6, 2026

This version

0.1.1

Mar 6, 2026

0.1.0 yanked

Mar 6, 2026

Reason this release was yanked:

信息错误

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aimirror-0.1.1.tar.gz (13.4 kB view details)

Uploaded Mar 6, 2026 Source

File details

Details for the file aimirror-0.1.1.tar.gz.

File metadata

Download URL: aimirror-0.1.1.tar.gz
Upload date: Mar 6, 2026
Size: 13.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for aimirror-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`8ff0db22dec8f91fbbe71bcaeef6c7d99ed611a978b5b3ccab2a5060c0a779eb`
MD5	`858ab002bd0e358e95e22c0c6fd39658`
BLAKE2b-256	`ad5af2269df41dd088c093dca8c34079c5406916150124029569fb8e5c73ab10`

See more details on using hashes here.

aimirror 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 aimirror

💡 项目背景

✨ 功能特性

🏗️ 架构

🚀 快速开始

🔧 客户端配置

📖 API

🐳 Docker 部署

⚙️ 配置示例

配置说明

🧪 测试

运行测试

手动验证

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes