Skip to main content

高效下载E站gallery的所有历史数据

Project description

ex-cd

高效下载E站gallery的所有历史数据

  • 尽量避免集中数据库,能放进文件夹的数据尽量放进文件夹
  • 尽量减少请求操作,能只用读文件的尽量只读文件
  • 尽量减少文件读写操作,能只用读文件列表的尽量只读文件列表

Usage

python -m ex_cd -c .vscode/config.json https://exhentai.org/g/2635845/ecbc9d9681/
python -m ex_cd -c <a json string> https://exhentai.org/g/2635845/ecbc9d9681/

You can see the example config file: .vscode/config.json

You can also set an EXCD_CONFIG_FILE env to specify a file, and the config in this file will be overridden by the config specified by -c:

export EXCD_CONFIG_FILE=".vscode/config.json"
python -m ex_cd -c <a json string> https://exhentai.org/g/2635845/ecbc9d9681/

You can see the example command line: .vscode/launch.json

How does it work?

URL更新

flowchart TD

UrlCheck1[输入URL] --> UrlCheck2(从URL中提取目标文件夹路径\ngallery-dl --dump-json '%s' --range 0\n< gallery_path >)
UrlCheck2 --> UrlCheck3(检查是否是过时内容\n< gallery_path >/metadata/child.url是否存在)
UrlCheck3 --> UrlCheck4{child.url存在 ?}
UrlCheck4 -->|是| UrlCheck5(按照child.url更新URL为最新) --> UrlCheck1
UrlCheck4 -->|否| MetaCheck1[结束\n返回最新URL] --> OldPlacehold[后台执行\n过时元数据占位]

过时元数据占位

flowchart TD

UrlCheck1[输入URL] --> UrlCheck2(从URL中提取目标文件夹路径\ngallery-dl --dump-json '%s' --range 0\n< gallery_path >) --> MetaCheck1(检查元数据文件存在性\n< gallery_path >/metadata/*.json 文件存在)
MetaCheck1 --> MetaCheck2{元数据文件存在 ?}
MetaCheck2 -->|是| MetaCheck3(检查parent存在性\n元数据文件中存在parent字段) --> MetaCheck4{parent字段存在 ?} -->|是| UrlCheck3(按照parent字段更新URL为过时URL) --> UrlCheck1
UrlCheck3 --> OldPlacehold1(从URL中提取目标文件夹路径) --> OldPlacehold2[在目标文件夹路径下放置child.url]
MetaCheck2 -->|否| MetaCheck5(下载一个元数据\ngallery-dl -v '%s' --no-download --range 0)
MetaCheck4 -->|否| MetaCheck5 --> MetaCheck1

元数据下载

flowchart TD

UrlCheck1[输入URL] --> URL更新 --> UrlCheck2(从URL中提取目标文件夹路径\ngallery-dl --dump-json '%s' --range 0\n< gallery_path >) --> MetaCheck1(检查元数据文件存在性\n< gallery_path >/metadata/*.json 文件存在)
MetaCheck1 --> MetaCheck2{元数据文件存在 ?}
MetaCheck2 -->|是| MetaCheck4(检查元数据完整性\n< gallery_path >/metadata/*.json 每个文件都可json解析\n其中 'filecount' 值和 < gallery_path >/metadata/*.json 文件数相等)
MetaCheck4 --> MetaCheck5{元数据文件完整 ?}
MetaCheck5 -->|否| MetaCheck3
MetaCheck2 -->|否| MetaCheck3(下载元数据 gallery-dl -v '%s' --no-download) --> MetaCheck1
MetaCheck5 -->|是| MetaCheck6[结束]
MetaCheck3 --> MetaCheck6

图片下载

!!!!!!!!! TODO: 确定是最新之后,元数据下载和图片下载同时进行 !!!!!!!!!

flowchart TD
UrlCheck1[输入URL] --> UrlCheck2[URL更新] --> ImgCheck1(检查图片文件存在性: \n< gallery_path >/metadata/*.json 对应的每一个图片文件都存在) --> ImgCheck2{图片文件均存在 ?} -->|是| ImgCheck3(检查图片文件内容: \n< gallery_path >/metadata/*.json 对应的图片文件的SHA1值都与< image_token >字段值相符) --> ImgCheck4{图片文件内容均符合image_token ?} -->|是| ImgCheck5[结束]
ImgCheck2 -->|否| Download(调用gallery-dl下载)
ImgCheck4 -->|否| Download
Download --> ImgCheck5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ex_cd-1.18.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ex_cd-1.18.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file ex_cd-1.18.0.tar.gz.

File metadata

  • Download URL: ex_cd-1.18.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ex_cd-1.18.0.tar.gz
Algorithm Hash digest
SHA256 4ab84f20f60195e34be1d54e9dedacfa88eef52e81d5a5e0f5959c4ce40a33ae
MD5 df03db6ac7ec3ba3a75eb3b692272e39
BLAKE2b-256 bffbc94dcd48a556e1e842df6c53c42e969609d5217c5f85720f0922b5ee8401

See more details on using hashes here.

File details

Details for the file ex_cd-1.18.0-py3-none-any.whl.

File metadata

  • Download URL: ex_cd-1.18.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ex_cd-1.18.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a80837690c3fd0e9808c829e4038ca87ebec44687fc3c084e85d1382fd4180de
MD5 68c6a9f6d6aa97b8852c542cfecad982
BLAKE2b-256 738d16a58ccc4a532cf07f0523e4d31b024246fcfc8cba2290093d3c7670d8e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page