Skip to main content

Add your description here

Project description

Overview

This repository now exposes a custom yt-dlp extractor plugin that understands https://51cg1.com/archives/... pages. The plugin drives Playwright to load the page in a headless Chromium instance, captures the HLS playlist used by the embedded player, and forwards it to yt-dlp together with the article title, cleaned description body, and every preview image found in the post content.

Because the extractor lives under the yt_dlp_plugins namespace package, simply running yt-dlp from this directory (or with this directory added to PYTHONPATH) automatically enables the plugin—no patching of yt-dlp itself is required.

Requirements

  1. uv for dependency management.
  2. Playwright browsers (Chromium is required).

Install everything with:

uv sync
uv run playwright install chromium

Usage

Quick start via the helper script

uv run python main.py https://51cg1.com/archives/234404/ \
  --write-description \
  --write-thumbnail \
  --write-all-thumbnails

The wrapper simply ensures yt_dlp_plugins is on sys.path, points yt-dlp at the provided URLs, and forwards options to save the cleaned article text (--write-description) and preview images (--write-thumbnail). Pass --write-all-thumbnails to persist every image that appeared in the article; otherwise yt-dlp’s default behavior is to keep only the highest-priority thumbnail. Outputs land under downloads/<title>/.

Direct yt-dlp invocation

If you prefer stock yt-dlp flags, just keep the repository root on PYTHONPATH:

PYTHONPATH=$(pwd) uv run yt-dlp \
  https://51cg1.com/archives/234404/ \
  --write-info-json \
  --write-description \
  --write-thumbnail --write-all-thumbnails \
  -o "%(title)s/%(id)s.%(ext)s"

The plugin exposes:

  • title: taken from the <h1 class="post-title">.
  • description: the article body with download widgets, tables, and other noise stripped, trademark boilerplate removed, and the original source URL appended.
  • thumbnails: every <img> inside the article that resolves to a data: URI. Those payloads are decoded/rewritten on-the-fly so a normal --write-thumbnail automatically saves them next to the video.
  • http_headers: correct Origin, Referer, and User-Agent so the captured .m3u8 URL can be fetched by yt-dlp.

Any additional yt-dlp switches (e.g., --embed-metadata, --ppa "AtomicParsley::SetCoverArt") work as usual.

Inline thumbnail handling

Many posts embed preview images as data: URIs instead of pointing at real files. The extractor decodes those payloads, and a small runtime patch extends yt-dlp’s native thumbnail writer so --write-thumbnail (or --write-all-thumbnails) emits the decoded bytes as normal .jpg/.png/.webp assets. Nothing special is required beyond passing the usual yt-dlp flags; the behavior works in both the helper script and direct yt-dlp invocations as long as this plugin is on PYTHONPATH.

During extraction Playwright scrolls the page and waits (≈8 s overall) for the site’s JavaScript to inline every image before it snapshots the DOM, so even lazily loaded placeholders get converted into actual data URLs before yt-dlp sees them. Non-inline <img> tags are ignored because the site serves empty placeholders for those; only the final inlined variants are surfaced to yt-dlp.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_dlp_51cg-0.1.0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_dlp_51cg-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file yt_dlp_51cg-0.1.0.tar.gz.

File metadata

  • Download URL: yt_dlp_51cg-0.1.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for yt_dlp_51cg-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fbfb3bf495d4e03c35e3d524db84372f88f3f91848890f0b3e27e503cee1ed1b
MD5 fe86561c9e7555c265a18ed4005490ed
BLAKE2b-256 d5533a1859d339885802789794455687f46ed725fe8b40fbe48bc158161c01c6

See more details on using hashes here.

File details

Details for the file yt_dlp_51cg-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for yt_dlp_51cg-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28f8936e49590dec8244bd21c47a1a926e8640efd1b598d627bf67002239fbf4
MD5 08b7b9c722ddadcb6af3715a583603ab
BLAKE2b-256 d7cfd2dabada79621a8ad5efacb80bfcd7074756aa1deb0af9b1c9fb38d387a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page