Add your description here
Project description
Overview
This repository now exposes a custom yt-dlp extractor plugin that understands https://51cg1.com/archives/... pages. The plugin drives Playwright to load the page in a headless Chromium instance, captures the HLS playlist used by the embedded player, and forwards it to yt-dlp together with the article title, cleaned description body, and every preview image found in the post content.
Because the extractor lives under the yt_dlp_plugins namespace package, simply running yt-dlp from this directory (or with this directory added to PYTHONPATH) automatically enables the plugin—no patching of yt-dlp itself is required.
Requirements
- uv for dependency management.
- Playwright browsers (Chromium is required).
Install everything with:
uv sync
uv run playwright install chromium
Usage
Quick start via the helper script
uv run python main.py https://51cg1.com/archives/234404/ \
--write-description \
--write-thumbnail \
--write-all-thumbnails
The wrapper simply ensures yt_dlp_plugins is on sys.path, points yt-dlp at the provided URLs, and forwards options to save the cleaned article text (--write-description) and preview images (--write-thumbnail). Pass --write-all-thumbnails to persist every image that appeared in the article; otherwise yt-dlp’s default behavior is to keep only the highest-priority thumbnail. Outputs land under downloads/<title>/.
Direct yt-dlp invocation
If you prefer stock yt-dlp flags, just keep the repository root on PYTHONPATH:
PYTHONPATH=$(pwd) uv run yt-dlp \
https://51cg1.com/archives/234404/ \
--write-info-json \
--write-description \
--write-thumbnail --write-all-thumbnails \
-o "%(title)s/%(id)s.%(ext)s"
The plugin exposes:
title: taken from the<h1 class="post-title">.description: the article body with download widgets, tables, and other noise stripped, trademark boilerplate removed, and the original source URL appended.thumbnails: every<img>inside the article that resolves to adata:URI. Those payloads are decoded/rewritten on-the-fly so a normal--write-thumbnailautomatically saves them next to the video.http_headers: correctOrigin,Referer, andUser-Agentso the captured.m3u8URL can be fetched by yt-dlp.
Any additional yt-dlp switches (e.g., --embed-metadata, --ppa "AtomicParsley::SetCoverArt") work as usual.
Inline thumbnail handling
Many posts embed preview images as data: URIs instead of pointing at real files. The extractor decodes those payloads, and a small runtime patch extends yt-dlp’s native thumbnail writer so --write-thumbnail (or --write-all-thumbnails) emits the decoded bytes as normal .jpg/.png/.webp assets. Nothing special is required beyond passing the usual yt-dlp flags; the behavior works in both the helper script and direct yt-dlp invocations as long as this plugin is on PYTHONPATH.
During extraction Playwright scrolls the page and waits (≈8 s overall) for the site’s JavaScript to inline every image before it snapshots the DOM, so even lazily loaded placeholders get converted into actual data URLs before yt-dlp sees them.
Non-inline <img> tags are ignored because the site serves empty placeholders for those; only the final inlined variants are surfaced to yt-dlp.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_dlp_51cg-0.1.0.tar.gz.
File metadata
- Download URL: yt_dlp_51cg-0.1.0.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbfb3bf495d4e03c35e3d524db84372f88f3f91848890f0b3e27e503cee1ed1b
|
|
| MD5 |
fe86561c9e7555c265a18ed4005490ed
|
|
| BLAKE2b-256 |
d5533a1859d339885802789794455687f46ed725fe8b40fbe48bc158161c01c6
|
File details
Details for the file yt_dlp_51cg-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yt_dlp_51cg-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28f8936e49590dec8244bd21c47a1a926e8640efd1b598d627bf67002239fbf4
|
|
| MD5 |
08b7b9c722ddadcb6af3715a583603ab
|
|
| BLAKE2b-256 |
d7cfd2dabada79621a8ad5efacb80bfcd7074756aa1deb0af9b1c9fb38d387a6
|