Skip to main content

View/select the URLs in an email message or file

Project description

Urlscan

main

Contributors

Scott Hansen <tech@firecat53.net> (Author and Maintainer)

Maxime Chatelle <xakz@rxsoft.eu> (Debian Maintainer)

Daniel Burrows <dburrows@debian.org> (Original Author)

Purpose and Requirements

Urlscan is a small program that is designed to integrate with the "mutt" mailreader to allow you to easily launch a Web browser for URLs contained in email messages. It is a replacement for the "urlview" program.

Requires: Python 3.7+ and the python-urwid library

Features

Urlscan parses an email message or file and scans it for URLs and email addresses. It then displays the URLs and their context within the message, and allows you to choose one or more URLs to send to your Web browser. Alternatively, it send a list of all URLs to stdout.

Relative to urlview, urlscan has the following additional features:

  • Support for emails in quoted-printable and base64 encodings. No more stripping out =40D from URLs by hand!

  • The context of each URL is provided along with the URL. For HTML mails, a crude parser is used to render the HTML into text. Context view can be toggled on/off with c.

  • URLs are shortened by default to fit on one line. Viewing full URL (for one or all) is toggled with s or S.

  • Jump to a URL by typing the number.

  • Incremental case-insensitive search with /.

  • Execute an arbitrary function (for example, copy URL to clipboard) instead of opening URL in a browser.

  • Use l to cycle through whether URLs are opened using the Python webbrowser module (default), xdg-open (if installed) or opened by a function passed on the command line with --run or --run-safe.

  • Configure colors and keybindings via ~/.config/urlscan/config.json. Generate default config file for editing by running urlscan -g. Cycle through available palettes with p. Set display width with --width.

  • Copy URL to clipboard with C or to primary selection with P. Requires xsel or xclip.

  • Run a command with the selected URL as the argument or pipe the selected URL to a command.

  • Show complete help menu with F1. Hide header on startup with --nohelp.

  • Use a custom regular expression with -E for matching urls or any other pattern. In junction with -r, this effectively turns urlscan into a general purpose CLI selector-type utility.

  • Scan certain email headers for URLs. Currently Link, Archived-At and List-* are scanned when --headers is passed.

  • Queue multiple URLs for opening and open them all at once with a and o.

Installation and setup

To install urlscan, install from your distribution repositories, from Pypi, or do a local development install with pip -e:

pipx install urlscan

OR

pip install --user urlscan

OR

cd <path/to/urlscan> && pip install --user -e .

NOTE

The minimum required version of urwid is 1.2.1.

Once urlscan is installed, add the following lines to your .muttrc:

macro index,pager \cb "<pipe-message> urlscan<Enter>" "call urlscan to
extract URLs out of a message"

macro attach,compose \cb "<pipe-entry> urlscan<Enter>" "call urlscan to
extract URLs out of a message"

Once this is done, Control-b while reading mail in mutt will automatically invoke urlscan on the message.

Note for Neomutt users: As of version 2023-05-17 true color support was implemented. If you are using true color support with Neomutt, or are encountering the error setupterm: could not find terminfo database, then you should also add TERM=xterm-256color to your macro in .muttrc. See more here #135. For example: macro index,pager \cb "<pipe-message> TERM=xterm-256color urlscan<Enter>" "call urlscan to extract URLs out of a message"

To choose a particular browser, set the environment variable BROWSER. If BROWSER is not set, xdg-open will control which browser is used, if it's available.:

export BROWSER=/usr/bin/epiphany

Command Line usage

urlscan OPTIONS <file>

OPTIONS [-c, --compact]
        [-d, --dedupe]
        [-E, --regex <expression>]
        [-f, --run-safe <expression>]
        [-g, --genconf]
        [-H, --nohelp]
        [    --headers]
        [-n, --no-browser]
        [-p, --pipe]
        [-r, --run <expression>]
        [-R, --reverse]
        [-s, --single]
        [-w, --width]
        [-W  --whitespace-off]

Urlscan can extract URLs and email addresses from emails or any text file. Calling with no flags will start the curses browser. Calling with '-n' will just output a list of URLs/email addressess to stdout. The '-c' flag removes the context from around the URLs in the curses browser, and the '-d' flag removes duplicate URLs. The '-R' flag reverses the displayed order of URLs and context. Files can also be piped to urlscan using normal shell pipe mechanisms: cat <something> | urlscan or urlscan < <something>. The '-W' flag condenses the display output by suppressing blank lines and ellipses lines.

Instead of opening a web browser, the selected URL can be passed as the argument to a command using --run-safe "<command> {}" or --run "<command> {}". Note the use of {} in the command string to denote the selected URL. Alternatively, the URL can be piped to the command using --run-safe <command> --pipe (or --run). Using --run-safe with --pipe is preferred if the command supports it, as it is marginally more secure and tolerant of special characters in the URL.

Theming

Run urlscan -g to generate ~/.config/urlscan/config.json with the default color and black & white palettes. This can be edited or added to, as desired. The first palette in the list will be the default. Configure the palettes according to the Urwid display attributes.

Display width can be set with --width.

Keybindings

Run urlscan -g to generate ~/.config/urlscan/config.json. All of the keys will be listed. You can either leave in place or delete any that will not be altered.

To unset a binding, set it equal to "". For example: "P": ""

The follow actions are supported:

  • add_url -- add a URL to the queue (default: a)
  • all_escape -- toggle unescape all URLs (default: u)
  • all_shorten -- toggle shorten all URLs (default: S)
  • bottom -- move cursor to last item (default: G)
  • clear_screen -- redraw screen (default: Ctrl-l)
  • clipboard -- copy highlighted URL to clipboard using xsel/xclip (default: C)
  • clipboard_pri -- copy highlighted URL to primary selection using xsel/xclip (default: P)
  • context -- show/hide context (default: c)
  • del_url -- delete URL from the queue (default: d)
  • down -- cursor down (default: j)
  • help_menu -- show/hide help menu (default: F1)
  • link_handler -- cycle link handling (webbrowser, xdg-open, --run-safe or --run) (default: l)
  • next -- jump to next URL (default: J)
  • open_queue -- open all URLs in queue (default: o)
  • open_queue_win -- open all URLs in queue in new window (default: O)
  • open_url -- open selected URL (default: space or enter)
  • palette -- cycle through palettes (default: p)
  • previous -- jump to previous URL (default: K)
  • quit -- quit (default: q or Q)
  • reverse -- reverse display order (default: R)
  • shorten -- toggle shorten highlighted URL (default: s)
  • top -- move to first list item (default: g)
  • up -- cursor up (default: k)

Update TLD list (for developers, not users)

wget https://data.iana.org/TLD/tlds-alpha-by-domain.txt

Known bugs and limitations

  • Running urlscan sometimes "messes up" the terminal background. This seems to be an urwid bug, but I haven't tracked down just what's going on.

  • Extraction of context from HTML messages leaves something to be desired. Probably the ideal solution would be to extract context on a word basis rather than on a paragraph basis.

  • The HTML message handling is a bit kludgy in general.

  • multipart/alternative sections are handled by descending into all the sub-parts, rather than just picking one, which may lead to URLs and context appearing twice. (Bypass this by selecting the '--dedupe' option)

Build/development

  • pyproject.toml is configured for hatch for building and submitting to pypi.
  • flake.nix is available for a development shell or building/testing the package if desired. nix develop

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urlscan-1.0.4.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

urlscan-1.0.4-py2.py3-none-any.whl (49.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file urlscan-1.0.4.tar.gz.

File metadata

  • Download URL: urlscan-1.0.4.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for urlscan-1.0.4.tar.gz
Algorithm Hash digest
SHA256 622bfa957615633f8c0b931b8e131b6cfcc5b843ccca75f89927468870ee0e8e
MD5 cc7ad73956604d6bd5da01f2b20fbbee
BLAKE2b-256 5e110610ca7763c958555688edac109cba92bd5b71bc42af7b8eeacafff0c96e

See more details on using hashes here.

Provenance

The following attestation bundles were made for urlscan-1.0.4.tar.gz:

Publisher: main.yml on firecat53/urlscan

Attestations:

File details

Details for the file urlscan-1.0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: urlscan-1.0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for urlscan-1.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7544d68bac034d39e73ed3879b81e14d3a0c41c2070b999230c8661421384597
MD5 44f05f32041fdc298ae755cd28c5990c
BLAKE2b-256 2db3fc7e4724b7affe7928b4f2efd2fc0be9e083a4789c1d547560b7a65a6b13

See more details on using hashes here.

Provenance

The following attestation bundles were made for urlscan-1.0.4-py2.py3-none-any.whl:

Publisher: main.yml on firecat53/urlscan

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page