All-in-one CLI tool to download and extract content from URLs
Project description
โฌ๏ธ abx-dl
A simple all-in-one CLI tool to auto-detect and download everything available from a URL.
pip install abx-dl
abx-dl 'https://example.com/page/to/download'
โจ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?
abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".
It's useful for scraping, downloading, OSINT, digital preservation, and more.
abx-dl provides a simpler one-shot CLI interface to the ArchiveBox plugin ecosystem.
๐ What does it save?
abx-dl --plugins=title,favicon,headers,wget,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'
abx-dl runs all plugins by default, or you can specify --plugins=... for specific methods:
- HTML, JS, CSS, images, etc. rendered with a headless browser
- title, favicon, headers, outlinks, and other metadata
- audio, video, subtitles, playlists, comments
- snapshot of the page as a PDF, screenshot, and Singlefile HTML
- article text,
gitsource code - and much more...
๐งฉ How does it work?
abx-dl uses the ABX Plugin Library (shared with ArchiveBox) to run a collection of downloading and scraping tools.
Plugins are discovered from the plugins/ directory and execute hooks in order:
- Crawl hooks run first (setup/install dependencies like Chrome)
- Snapshot hooks run per-URL to extract content
Each plugin can output:
- Files to its output directory
- JSONL records for status reporting
- Config updates that propagate to subsequent plugins
โ๏ธ Configuration
Configuration is handled via environment variables or persistent config file (~/.config/abx/config.env):
abx-dl config # show all config (global + per-plugin)
abx-dl config --get WGET_TIMEOUT # get a specific value
abx-dl config --set TIMEOUT=120 # set persistently (resolves aliases)
Output is grouped by section:
# GLOBAL
TIMEOUT=60
USER_AGENT="Mozilla/5.0 ..."
...
# plugins/wget
WGET_BINARY="wget"
WGET_TIMEOUT=60
...
# plugins/chrome
CHROME_BINARY="chromium"
...
Common options:
TIMEOUT=60- default timeout for hooksUSER_AGENT- default user agent string{PLUGIN}_BINARY- path to plugin's binary (e.g.WGET_BINARY,CHROME_BINARY){PLUGIN}_ENABLED=true/false- enable/disable specific plugins{PLUGIN}_TIMEOUT=120- per-plugin timeout overrides
Aliases are automatically resolved (e.g. --set USE_WGET=false saves as WGET_ENABLED=false).
๐ฆ Install
pip install abx-dl
abx-dl plugins --install # optional: pre-install plugin dependencies
๐ Usage
# Basic usage - download URL with all plugins:
abx-dl 'https://example.com'
# Download with specific plugins only:
abx-dl --plugins=wget,ytdlp,git,screenshot 'https://example.com'
# Skip auto-installing missing dependencies (emit warnings instead):
abx-dl --no-install 'https://example.com'
# Specify output directory:
abx-dl --output=./downloads 'https://example.com'
# Set timeout:
abx-dl --timeout=120 'https://example.com'
Commands
abx-dl <url> # Download URL (default command)
abx-dl plugins # Check + show info for all plugins
abx-dl plugins wget ytdlp git # Check + show info for specific plugins
abx-dl plugins --install # Install all plugin dependencies
abx-dl plugins --install wget ytdlp git # Install specific plugin dependencies
abx-dl config # Show all config values
abx-dl config --get TIMEOUT # Get a specific config value
abx-dl config --set TIMEOUT=120 # Set a config value persistently
Installing Dependencies
Many plugins require external binaries (e.g., wget, chrome, yt-dlp, single-file).
By default, abx-dl lazily auto-installs missing dependencies as needed when you download a URL.
Use --no-install to skip plugins with missing dependencies instead:
# Auto-installs missing deps on-the-fly (default behavior)
abx-dl 'https://example.com'
# Skip plugins with missing deps, emit warnings instead
abx-dl --no-install 'https://example.com'
# Pre-install all plugin dependencies
abx-dl plugins --install
# Install dependencies for specific plugins only
abx-dl plugins --install wget singlefile ytdlp
# Check which dependencies are available/missing
abx-dl plugins
Dependencies are installed to ~/.config/abx/lib/{arch}/ using the appropriate package manager:
- pip packages โ
~/.config/abx/lib/{arch}/pip/venv/ - npm packages โ
~/.config/abx/lib/{arch}/npm/ - brew/apt packages โ system locations
You can override the install location with LIB_DIR=/path/to/lib abx-dl install.
Output Structure
./
โโโ index.jsonl # Snapshot metadata and results (JSONL format)
โโโ title/
โ โโโ title.txt
โโโ favicon/
โ โโโ favicon.ico
โโโ screenshot/
โ โโโ screenshot.png
โโโ pdf/
โ โโโ output.pdf
โโโ dom/
โ โโโ output.html
โโโ wget/
โ โโโ example.com/
โ โโโ index.html
โโโ singlefile/
โ โโโ output.html
โโโ ...
All Outputs
index.jsonl- snapshot metadata and plugin results (JSONL format, ArchiveBox-compatible)title/title.txt- page titlefavicon/favicon.ico- site faviconscreenshot/screenshot.png- full page screenshot (Chrome)pdf/output.pdf- page as PDF (Chrome)dom/output.html- rendered DOM (Chrome)wget/example.com/...- mirrored site filessinglefile/output.html- single-file HTML snapshot- ... and more via plugin library ...
Architecture
abx-dl is built on these components:
abx_dl/plugins.py- Plugin discovery fromplugins/directoryabx_dl/executor.py- Hook execution engine with config propagationabx_dl/config.py- Environment variable configurationabx_dl/cli.py- Rich CLI with live progress display
Plugins are symlinked from ArchiveBox's plugin directory.
For more advanced use with collections, parallel downloading, a Web UI + REST API, etc.
See: ArchiveBox/ArchiveBox
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file abx_dl-1.0.3.tar.gz.
File metadata
- Download URL: abx_dl-1.0.3.tar.gz
- Upload date:
- Size: 215.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f505315eebfa5d4a23c70e478ab31988961e0433c63d492f12d6e1b9dd08bf31
|
|
| MD5 |
da6c00d3ed0eaf25ef1eb47f671cb7f8
|
|
| BLAKE2b-256 |
788b2a983f3c86ae95ed3205d52fe05a741a7d4bb903cc46e2092cdeadb3762f
|
Provenance
The following attestation bundles were made for abx_dl-1.0.3.tar.gz:
Publisher:
publish.yml on ArchiveBox/abx-dl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
abx_dl-1.0.3.tar.gz -
Subject digest:
f505315eebfa5d4a23c70e478ab31988961e0433c63d492f12d6e1b9dd08bf31 - Sigstore transparency entry: 782509791
- Sigstore integration time:
-
Permalink:
ArchiveBox/abx-dl@352d1e5490a62e567ed4935a83b3cf37eda91d03 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/ArchiveBox
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@352d1e5490a62e567ed4935a83b3cf37eda91d03 -
Trigger Event:
release
-
Statement type:
File details
Details for the file abx_dl-1.0.3-py3-none-any.whl.
File metadata
- Download URL: abx_dl-1.0.3-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c53df8b9863dd93f8d38230bcf9e0d3b8e7bcf7715950db23667e4ec5342c0d
|
|
| MD5 |
2b9226ee542990d4bebdffab0fbcc767
|
|
| BLAKE2b-256 |
5bd93b9293c8a8f2b19cf5b46b89605dc127e8d3539e8e689d76a2314c8898a5
|
Provenance
The following attestation bundles were made for abx_dl-1.0.3-py3-none-any.whl:
Publisher:
publish.yml on ArchiveBox/abx-dl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
abx_dl-1.0.3-py3-none-any.whl -
Subject digest:
9c53df8b9863dd93f8d38230bcf9e0d3b8e7bcf7715950db23667e4ec5342c0d - Sigstore transparency entry: 782509799
- Sigstore integration time:
-
Permalink:
ArchiveBox/abx-dl@352d1e5490a62e567ed4935a83b3cf37eda91d03 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/ArchiveBox
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@352d1e5490a62e567ed4935a83b3cf37eda91d03 -
Trigger Event:
release
-
Statement type: