Skip to main content

All-in-one CLI tool to download and extract content from URLs

Project description

โฌ‡๏ธ abx-dl

A simple all-in-one CLI tool to auto-detect and download everything available from a URL.

pip install abx-dl
abx-dl 'https://example.com/page/to/download'

โœจ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?

abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".

It's useful for scraping, downloading, OSINT, digital preservation, and more. abx-dl provides a simpler one-shot CLI interface to the ArchiveBox plugin ecosystem.



๐Ÿœ What does it save?

abx-dl --plugins=title,favicon,headers,wget,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'

abx-dl runs all plugins by default, or you can specify --plugins=... for specific methods:

  • HTML, JS, CSS, images, etc. rendered with a headless browser
  • title, favicon, headers, outlinks, and other metadata
  • audio, video, subtitles, playlists, comments
  • snapshot of the page as a PDF, screenshot, and Singlefile HTML
  • article text, git source code
  • and much more...

๐Ÿงฉ How does it work?

abx-dl uses the ABX Plugin Library (shared with ArchiveBox) to run a collection of downloading and scraping tools.

Plugins are discovered from the plugins/ directory and execute hooks in order:

  1. Crawl hooks run first (setup/install dependencies like Chrome)
  2. Snapshot hooks run per-URL to extract content

Each plugin can output:

  • Files to its output directory
  • JSONL records for status reporting
  • Config updates that propagate to subsequent plugins

โš™๏ธ Configuration

Configuration is handled via environment variables:

  • CHROME_BINARY, WGET_BINARY, etc. - binary paths
  • TIMEOUT=60 - default timeout for hooks
  • {PLUGIN}_ENABLED=true/false - enable/disable specific plugins
  • {PLUGIN}_TIMEOUT=120 - per-plugin timeout overrides



๐Ÿ“ฆ Install

pip install abx-dl
abx-dl install           # optional: install plugin dependencies

๐Ÿ”  Usage

# Basic usage - download URL with all plugins:
abx-dl 'https://example.com'

# Download with specific plugins only:
abx-dl --plugins=title,favicon,screenshot 'https://example.com'

# Specify output directory:
abx-dl --output=./downloads 'https://example.com'

# Set timeout:
abx-dl --timeout=120 'https://example.com'

Commands

abx-dl <url>                    # Download URL (default command)
abx-dl plugins                  # List available plugins
abx-dl info <plugin>            # Show plugin details
abx-dl install [plugins]        # Install plugin dependencies
abx-dl check [plugins]          # Check dependency status



Output Structure

./
โ”œโ”€โ”€ index.json              # Snapshot metadata and results
โ”œโ”€โ”€ title/
โ”‚   โ””โ”€โ”€ title.txt
โ”œโ”€โ”€ favicon/
โ”‚   โ””โ”€โ”€ favicon.ico
โ”œโ”€โ”€ screenshot/
โ”‚   โ””โ”€โ”€ screenshot.png
โ”œโ”€โ”€ pdf/
โ”‚   โ””โ”€โ”€ output.pdf
โ”œโ”€โ”€ dom/
โ”‚   โ””โ”€โ”€ output.html
โ”œโ”€โ”€ wget/
โ”‚   โ””โ”€โ”€ example.com/
โ”‚       โ””โ”€โ”€ index.html
โ”œโ”€โ”€ singlefile/
โ”‚   โ””โ”€โ”€ output.html
โ””โ”€โ”€ ...

All Outputs

  • index.json - snapshot metadata and plugin results
  • title/title.txt - page title
  • favicon/favicon.ico - site favicon
  • screenshot/screenshot.png - full page screenshot (Chrome)
  • pdf/output.pdf - page as PDF (Chrome)
  • dom/output.html - rendered DOM (Chrome)
  • wget/example.com/... - mirrored site files
  • singlefile/output.html - single-file HTML snapshot
  • ... and more via plugin library ...

Architecture

abx-dl is built on these components:

  • abx_dl/plugins.py - Plugin discovery from plugins/ directory
  • abx_dl/executor.py - Hook execution engine with config propagation
  • abx_dl/config.py - Environment variable configuration
  • abx_dl/cli.py - Rich CLI with live progress display

Plugins are symlinked from ArchiveBox's plugin directory.


For more advanced use with collections, parallel downloading, a Web UI + REST API, etc. See: ArchiveBox/ArchiveBox

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abx_dl-1.0.1.tar.gz (211.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abx_dl-1.0.1-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file abx_dl-1.0.1.tar.gz.

File metadata

  • Download URL: abx_dl-1.0.1.tar.gz
  • Upload date:
  • Size: 211.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for abx_dl-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c403c50523e9a349d4868a04ea25c31cb5f76852986cd17ae28ee3c34664f945
MD5 51c1f2d49d55982621d34c3e8e3c23e2
BLAKE2b-256 420cb11aecb0eb776ffada882cbd726f8698a22e6c026685c72adf0cd6361178

See more details on using hashes here.

Provenance

The following attestation bundles were made for abx_dl-1.0.1.tar.gz:

Publisher: publish.yml on ArchiveBox/abx-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file abx_dl-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: abx_dl-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for abx_dl-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 83364167cf53c87142fb6619d43c70531d2b15d4624623fdb8e4c0c3c129a6d9
MD5 c73a75b5fd2f29e46c261aa2eb5c0dc4
BLAKE2b-256 880df89d3ed743159d06c1e13087ca651b6e79758a7cc36f1b0f0a4032a65e58

See more details on using hashes here.

Provenance

The following attestation bundles were made for abx_dl-1.0.1-py3-none-any.whl:

Publisher: publish.yml on ArchiveBox/abx-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page