Video content distillation CLI tool
Project description
vidistill
A toolset to download, parse and analyse social-media video content.
Install
uv sync
uv run playwright install chromium # one-time: downloads ~150 MB browser binary
playwright install chromium is required by the collect subcommand, which
drives a headless browser against Bilibili pages.
Environment
Copy .env.example → .env and fill in:
OPENROUTER_API_KEY— LLM provider foranalyse.VOLC_ACCESS_KEY_ID/VOLC_SECRET_ACCESS_KEY/VOLC_TOS_BUCKET— Volcengine TOS + 豆包 ASR credentials used by the default transcriber.
CLI
vidistill analyse <input> [--output-dir …] [--model …] [--transcriber …] [--diarize]
vidistill collect <page-path> [--rule …] [--num …] [--sort …] [--max-scrolls …] [--headful]
vidistill analyse
Runs the full pipeline on a single video: download → audio extraction → transcription → LLM analysis → markdown report.
vidistill analyse https://www.bilibili.com/video/BV1xx411c7mu
vidistill collect
Scrape a Bilibili listing page, filter videos by a rule expression, and write the top-N matches as JSON. Supported page types:
| Type | URL shape |
|---|---|
| homepage | www.bilibili.com/ (the recommended feed) |
| user | space.bilibili.com/<uid> |
| search | search.bilibili.com/all?keyword=… |
| channel | www.bilibili.com/c/<slug> |
| popular | www.bilibili.com/v/popular/<column> (all, weekly, …) |
| topic | www.bilibili.com/v/topic/detail?topic_id=<id> |
vidistill collect https://space.bilibili.com/12345 \
--rule='play>10000 and like>=500' --num=20
vidistill collect "https://search.bilibili.com/all?keyword=LLM" \
-r 'play>50000 and duration<=600' -n 10 --sort=play
Rule DSL (--rule) supports comparisons (== != > >= < <=), boolean ops
(and, or, not), parentheses, and the following fields: play, like,
coin, favorite (alias star), share, danmaku, comment (alias
reply), duration (seconds), publish_days_ago, title, author. Missing
fields evaluate the whole rule to False for that video.
Output lands in <output-dir>/collect_<page-type>_<identifier>_<ts>.json.
Bilibili login & storage state
Bilibili's anti-bot layer rejects guest scraping. The --storage-state flag
uses Playwright's native storage-state (cookies + localStorage) to persist
login across runs. Default: default.storage_state in the current directory.
First run (no login yet):
vidistill collect https://space.bilibili.com/<uid> -n 10
If default.storage_state doesn't exist, a browser window opens. Log in to
Bilibili there. When you close the browser, storage state is saved
automatically and scraping proceeds. Subsequent runs reuse that state.
Explicit path:
vidistill collect <url> --storage-state ~/my-session.storage_state -n 10
Development
uv run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vidistill-0.3.3-py3-none-any.whl.
File metadata
- Download URL: vidistill-0.3.3-py3-none-any.whl
- Upload date:
- Size: 32.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cc91ee681d2fe46f86a8d1f192c2263cd3c75579c5777cafd326e222ebb0e88
|
|
| MD5 |
f19b1b613f1591670f459a4d829a534a
|
|
| BLAKE2b-256 |
36aad2ea441dee43d5a449891b914a74b67008793173c2894fd3422499744e1e
|