urltitle uses Python 3.7 to return the page title or header-based description for a given URL. Its intended primary use is the inclusion of the returned value in conversations. As a disclaimer, note that the returned title is not guaranteed to be accurate due to many possible factors.

  • An in-memory cache is used with a default time of a week. The cache size and time are customizable.
  • Approximately only the fraction of a HTML page required to return a title is read, up to a customizable maximum of 1 MiB.
  • A PDF title metadata extractor is used for PDF files of up to a customizable maximum size of 8 MiB.
  • Up to three attempts are made for resiliency except if there is an unrecoverable error, i.e. 400, 401, 404, etc.
  • A guess of https and otherwise http is made for a URL with a missing scheme, e.g.
  • SSL verification for https sites can optionally be disabled.
  • A fallback to Google web cache is used if a HTML page presents a Distil captcha. It is also used for a PDF which is too large or doesn't have title metadata.
  • Diagnostic logging can be optionally enabled for the logger named urltitle at the desired level.
  • Some site-specific customizations are configurable:
    • Multiple regular expression based URL and title substitutions
    • Use of Google web cache
    • User-Agent
    • Additional headers
    • Title selector




Python ≥3.7 is required due to a reference to SSLCertVerificationError.

To install the package, run:

pip install urltitle


from urltitle import URLTitleReader

reader = URLTitleReader(verify_ssl=True)

# Titles for HTML content
"Insect numbers in precipitous decline could have 'catastrophic' consequences, warns study - CNN"

'Deep Learning State of the Art (2019) - MIT - YouTube'

# Titles for URLs with a missing scheme
"Army calls base housing hazards 'unconscionable,' details steps to protect families | Reuters"

'Paternal high-fat diet transgenerationally impacts hepatic immunometabolism. - PubMed - NCBI : FoodNerds'

'NeverSSL - helping you get online'

# Titles for non-ASCII URLs
'Amanattō - Wikipedia'

"Wikipédia, l'encyclopédie libre"

# Titles for PDFs having title metadata
'Artificial sweeteners induce glucose intolerance by altering the gut microbiota'

'Detection of Glyphosate in Malformed Piglets'

# Titles for other content showing Content-Type and Content-Length as available:
'(image/jpeg) (54K)'

'(application/rss+xml; charset=UTF-8)'

'(application/octet-stream) (2G)'

# Titles for substituted URLs as per configuration:
'[1902.04704] Neural network models and deep learning - a primer for biologists'

"Features of a successful therapeutic fast of 382 days' duration"

'Nutrition and health. The issue is not food, nor nutrients, so much as processing. - Semantic Scholar'


An error is expected to raise the urltitle.URLTitleError exception.


For any site-specific customizations, update (but ideally not replace) urltitle.config.overrides.NETLOC_OVERRIDES with the relevant sites using the preexisting entries in it as examples. Refer to The site of a URL is as defined and returned by the URLTitleReader().netloc(url) method in

The following examples show various URLs and their corresponding sites for the purpose of entering site-specific customizations:

URL Site

