All-in-one CLI tool to download and extract content from URLs
Project description
โฌ๏ธ abx-dl
A simple all-in-one CLI tool to auto-detect and download everything available from a URL.
pip install abx-dl
abx-dl 'https://example.com/page/to/download'
โจ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?
abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".
It's useful for scraping, downloading, OSINT, digital preservation, and more.
abx-dl provides a simpler one-shot CLI interface to the ArchiveBox plugin ecosystem.
๐ What does it save?
abx-dl --plugins=title,favicon,headers,wget,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'
abx-dl runs all plugins by default, or you can specify --plugins=... for specific methods:
- HTML, JS, CSS, images, etc. rendered with a headless browser
- title, favicon, headers, outlinks, and other metadata
- audio, video, subtitles, playlists, comments
- snapshot of the page as a PDF, screenshot, and Singlefile HTML
- article text,
gitsource code - and much more...
๐งฉ How does it work?
abx-dl uses the ABX Plugin Library (shared with ArchiveBox) to run a collection of downloading and scraping tools.
Plugins are discovered from the plugins/ directory and execute hooks in order:
- Crawl hooks run first (setup/install dependencies like Chrome)
- Snapshot hooks run per-URL to extract content
Each plugin can output:
- Files to its output directory
- JSONL records for status reporting
- Config updates that propagate to subsequent plugins
โ๏ธ Configuration
Configuration is handled via environment variables:
CHROME_BINARY,WGET_BINARY, etc. - binary pathsTIMEOUT=60- default timeout for hooks{PLUGIN}_ENABLED=true/false- enable/disable specific plugins{PLUGIN}_TIMEOUT=120- per-plugin timeout overrides
๐ฆ Install
pip install abx-dl
abx-dl install # optional: install plugin dependencies
๐ Usage
# Basic usage - download URL with all plugins:
abx-dl 'https://example.com'
# Download with specific plugins only:
abx-dl --plugins=title,favicon,screenshot 'https://example.com'
# Specify output directory:
abx-dl --output=./downloads 'https://example.com'
# Set timeout:
abx-dl --timeout=120 'https://example.com'
Commands
abx-dl <url> # Download URL (default command)
abx-dl plugins # List available plugins
abx-dl info <plugin> # Show plugin details
abx-dl install [plugins] # Install plugin dependencies
abx-dl check [plugins] # Check dependency status
Output Structure
./
โโโ index.json # Snapshot metadata and results
โโโ title/
โ โโโ title.txt
โโโ favicon/
โ โโโ favicon.ico
โโโ screenshot/
โ โโโ screenshot.png
โโโ pdf/
โ โโโ output.pdf
โโโ dom/
โ โโโ output.html
โโโ wget/
โ โโโ example.com/
โ โโโ index.html
โโโ singlefile/
โ โโโ output.html
โโโ ...
All Outputs
index.json- snapshot metadata and plugin resultstitle/title.txt- page titlefavicon/favicon.ico- site faviconscreenshot/screenshot.png- full page screenshot (Chrome)pdf/output.pdf- page as PDF (Chrome)dom/output.html- rendered DOM (Chrome)wget/example.com/...- mirrored site filessinglefile/output.html- single-file HTML snapshot- ... and more via plugin library ...
Architecture
abx-dl is built on these components:
abx_dl/plugins.py- Plugin discovery fromplugins/directoryabx_dl/executor.py- Hook execution engine with config propagationabx_dl/config.py- Environment variable configurationabx_dl/cli.py- Rich CLI with live progress display
Plugins are symlinked from ArchiveBox's plugin directory.
For more advanced use with collections, parallel downloading, a Web UI + REST API, etc.
See: ArchiveBox/ArchiveBox
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file abx_dl-1.0.0.tar.gz.
File metadata
- Download URL: abx_dl-1.0.0.tar.gz
- Upload date:
- Size: 211.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8cfe6748a7a58d986aa5e196b6e09b56684f32c7df64ea2ee5a8017d68a1292
|
|
| MD5 |
43d830223b841f80db3aca0959cc301f
|
|
| BLAKE2b-256 |
d8fc665704f528386c6f38ec27d3d45a1acd6b36082634fa01af5f84eecb0621
|
Provenance
The following attestation bundles were made for abx_dl-1.0.0.tar.gz:
Publisher:
publish.yml on ArchiveBox/abx-dl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
abx_dl-1.0.0.tar.gz -
Subject digest:
e8cfe6748a7a58d986aa5e196b6e09b56684f32c7df64ea2ee5a8017d68a1292 - Sigstore transparency entry: 782415946
- Sigstore integration time:
-
Permalink:
ArchiveBox/abx-dl@a9b4a6a081b6e0eb8cf3cdab4ed1437a1ddcb916 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/ArchiveBox
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a9b4a6a081b6e0eb8cf3cdab4ed1437a1ddcb916 -
Trigger Event:
push
-
Statement type:
File details
Details for the file abx_dl-1.0.0-py3-none-any.whl.
File metadata
- Download URL: abx_dl-1.0.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7c63c4c977181fb34e15f2c4cd5483a642fc061cfd07de3e8b1b83f857a8b64
|
|
| MD5 |
405c6d17dc45755a0907755e766a8a9f
|
|
| BLAKE2b-256 |
ddb9db97d773b6db1652d67a931627e24485aca37803b56028571bdbb8c21345
|
Provenance
The following attestation bundles were made for abx_dl-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on ArchiveBox/abx-dl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
abx_dl-1.0.0-py3-none-any.whl -
Subject digest:
a7c63c4c977181fb34e15f2c4cd5483a642fc061cfd07de3e8b1b83f857a8b64 - Sigstore transparency entry: 782415950
- Sigstore integration time:
-
Permalink:
ArchiveBox/abx-dl@a9b4a6a081b6e0eb8cf3cdab4ed1437a1ddcb916 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/ArchiveBox
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a9b4a6a081b6e0eb8cf3cdab4ed1437a1ddcb916 -
Trigger Event:
push
-
Statement type: