Skip to main content

All-in-one CLI tool to download and extract content from URLs

Project description

โฌ‡๏ธ abx-dl

A simple all-in-one CLI tool to auto-detect and download everything available from a URL.

pip install abx-dl
abx-dl 'https://example.com/page/to/download'

โœจ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?

abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".

It's useful for scraping, downloading, OSINT, digital preservation, and more. abx-dl provides a simpler one-shot CLI interface to the ArchiveBox plugin ecosystem.



๐Ÿœ What does it save?

abx-dl --plugins=title,favicon,headers,wget,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'

abx-dl runs all plugins by default, or you can specify --plugins=... for specific methods:

  • HTML, JS, CSS, images, etc. rendered with a headless browser
  • title, favicon, headers, outlinks, and other metadata
  • audio, video, subtitles, playlists, comments
  • snapshot of the page as a PDF, screenshot, and Singlefile HTML
  • article text, git source code
  • and much more...

๐Ÿงฉ How does it work?

abx-dl uses the ABX Plugin Library (shared with ArchiveBox) to run a collection of downloading and scraping tools.

Plugins are discovered from the plugins/ directory and execute hooks in order:

  1. Crawl hooks run first (setup/install dependencies like Chrome)
  2. Snapshot hooks run per-URL to extract content

Each plugin can output:

  • Files to its output directory
  • JSONL records for status reporting
  • Config updates that propagate to subsequent plugins

โš™๏ธ Configuration

Configuration is handled via environment variables or persistent config file (~/.config/abx/config.env):

abx-dl config                        # show all config (global + per-plugin)
abx-dl config --get WGET_TIMEOUT     # get a specific value
abx-dl config --set TIMEOUT=120      # set persistently (resolves aliases)

Output is grouped by section:

# GLOBAL
TIMEOUT=60
USER_AGENT="Mozilla/5.0 ..."
...

# plugins/wget
WGET_BINARY="wget"
WGET_TIMEOUT=60
...

# plugins/chrome
CHROME_BINARY="chromium"
...

Common options:

  • TIMEOUT=60 - default timeout for hooks
  • USER_AGENT - default user agent string
  • {PLUGIN}_BINARY - path to plugin's binary (e.g. WGET_BINARY, CHROME_BINARY)
  • {PLUGIN}_ENABLED=true/false - enable/disable specific plugins
  • {PLUGIN}_TIMEOUT=120 - per-plugin timeout overrides

Aliases are automatically resolved (e.g. --set USE_WGET=false saves as WGET_ENABLED=false).




๐Ÿ“ฆ Install

pip install abx-dl
abx-dl plugins --install   # optional: pre-install plugin dependencies

๐Ÿ”  Usage

# Basic usage - download URL with all plugins:
abx-dl 'https://example.com'

# Download with specific plugins only:
abx-dl --plugins=wget,ytdlp,git,screenshot 'https://example.com'

# Skip auto-installing missing dependencies (emit warnings instead):
abx-dl --no-install 'https://example.com'

# Specify output directory:
abx-dl --output=./downloads 'https://example.com'

# Set timeout:
abx-dl --timeout=120 'https://example.com'

Commands

abx-dl <url>                              # Download URL (default command)
abx-dl plugins                            # Check + show info for all plugins
abx-dl plugins wget ytdlp git             # Check + show info for specific plugins
abx-dl plugins --install                  # Install all plugin dependencies
abx-dl plugins --install wget ytdlp git   # Install specific plugin dependencies
abx-dl config                             # Show all config values
abx-dl config --get TIMEOUT               # Get a specific config value
abx-dl config --set TIMEOUT=120           # Set a config value persistently

Installing Dependencies

Many plugins require external binaries (e.g., wget, chrome, yt-dlp, single-file).

By default, abx-dl lazily auto-installs missing dependencies as needed when you download a URL. Use --no-install to skip plugins with missing dependencies instead:

# Auto-installs missing deps on-the-fly (default behavior)
abx-dl 'https://example.com'

# Skip plugins with missing deps, emit warnings instead
abx-dl --no-install 'https://example.com'

# Pre-install all plugin dependencies
abx-dl plugins --install

# Install dependencies for specific plugins only
abx-dl plugins --install wget singlefile ytdlp

# Check which dependencies are available/missing
abx-dl plugins

Dependencies are installed to ~/.config/abx/lib/{arch}/ using the appropriate package manager:

  • pip packages โ†’ ~/.config/abx/lib/{arch}/pip/venv/
  • npm packages โ†’ ~/.config/abx/lib/{arch}/npm/
  • brew/apt packages โ†’ system locations

You can override the install location with LIB_DIR=/path/to/lib abx-dl install.




Output Structure

./
โ”œโ”€โ”€ index.jsonl             # Snapshot metadata and results (JSONL format)
โ”œโ”€โ”€ title/
โ”‚   โ””โ”€โ”€ title.txt
โ”œโ”€โ”€ favicon/
โ”‚   โ””โ”€โ”€ favicon.ico
โ”œโ”€โ”€ screenshot/
โ”‚   โ””โ”€โ”€ screenshot.png
โ”œโ”€โ”€ pdf/
โ”‚   โ””โ”€โ”€ output.pdf
โ”œโ”€โ”€ dom/
โ”‚   โ””โ”€โ”€ output.html
โ”œโ”€โ”€ wget/
โ”‚   โ””โ”€โ”€ example.com/
โ”‚       โ””โ”€โ”€ index.html
โ”œโ”€โ”€ singlefile/
โ”‚   โ””โ”€โ”€ output.html
โ””โ”€โ”€ ...

All Outputs

  • index.jsonl - snapshot metadata and plugin results (JSONL format, ArchiveBox-compatible)
  • title/title.txt - page title
  • favicon/favicon.ico - site favicon
  • screenshot/screenshot.png - full page screenshot (Chrome)
  • pdf/output.pdf - page as PDF (Chrome)
  • dom/output.html - rendered DOM (Chrome)
  • wget/example.com/... - mirrored site files
  • singlefile/output.html - single-file HTML snapshot
  • ... and more via plugin library ...

Architecture

abx-dl is built on these components:

  • abx_dl/plugins.py - Plugin discovery from plugins/ directory
  • abx_dl/executor.py - Hook execution engine with config propagation
  • abx_dl/config.py - Environment variable configuration
  • abx_dl/cli.py - Rich CLI with live progress display

Plugins are symlinked from ArchiveBox's plugin directory.


For more advanced use with collections, parallel downloading, a Web UI + REST API, etc. See: ArchiveBox/ArchiveBox

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abx_dl-1.0.3.tar.gz (215.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abx_dl-1.0.3-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file abx_dl-1.0.3.tar.gz.

File metadata

  • Download URL: abx_dl-1.0.3.tar.gz
  • Upload date:
  • Size: 215.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for abx_dl-1.0.3.tar.gz
Algorithm Hash digest
SHA256 f505315eebfa5d4a23c70e478ab31988961e0433c63d492f12d6e1b9dd08bf31
MD5 da6c00d3ed0eaf25ef1eb47f671cb7f8
BLAKE2b-256 788b2a983f3c86ae95ed3205d52fe05a741a7d4bb903cc46e2092cdeadb3762f

See more details on using hashes here.

Provenance

The following attestation bundles were made for abx_dl-1.0.3.tar.gz:

Publisher: publish.yml on ArchiveBox/abx-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file abx_dl-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: abx_dl-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for abx_dl-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9c53df8b9863dd93f8d38230bcf9e0d3b8e7bcf7715950db23667e4ec5342c0d
MD5 2b9226ee542990d4bebdffab0fbcc767
BLAKE2b-256 5bd93b9293c8a8f2b19cf5b46b89605dc127e8d3539e8e689d76a2314c8898a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for abx_dl-1.0.3-py3-none-any.whl:

Publisher: publish.yml on ArchiveBox/abx-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page