Skip to main content

Get link (URL) preview

Project description

linkpreview

Build Status Coverage Status pypi

Get link preview in python

Gathering data from:

  1. OpenGraph meta tags
  2. TwitterCard meta tags
  3. Microdata meta tags
  4. JSON-LD meta tags
  5. HTML Generic tags (h1, p, img)
  6. URL readable parts

Install

pip install linkpreview

Usage

Basic

from linkpreview import link_preview

url = "http://localhost"
content = """
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width">
    <!-- ... --->
    <title>a title</title>
  </head>
  <body>
  <!-- ... --->
  </body>
</html>
"""
preview = link_preview(url, content)
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

Automatic fetch link content

from linkpreview import link_preview

preview = link_preview("http://github.com/")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

lxml as XML parser

Very recommended for better performance.

Install the lxml and use it like this:

from linkpreview import link_preview

preview = link_preview("http://github.com/", parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

Advanced

from linkpreview import Link, LinkPreview, LinkGrabber

url = "http://github.com"
grabber = LinkGrabber(
    initial_timeout=20,
    maxsize=1048576,
    receive_timeout=10,
    chunk_size=1024,
)
content, url = grabber.get_content(url)
link = Link(url, content)
preview = LinkPreview(link, parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

Extend default headers:

content, url = grabber.get_content(url, headers={'user-agent': 'Twitterbot'})

Ignore default headers:

content, url = grabber.get_content(
  url,
  headers={'user-agent': 'Twitterbot', 'accept': '*/*'},
  replace_headers=True,
)

Use preset headers:

content, url = grabber.get_content( url, headers='googlebot')

Available presets: firefox, chrome, googlebot, twitterbot, telegrambot, imessagebot

If you already have parsed BeautifulSoup object:

from bs4 import BeautifulSoup
from linkpreview import Link, LinkPreview

url = "http://example.com"
content = "<h1>Hello</h1>"
soup = BeautifulSoup(content, "html.parser")
link = Link(url, content)
preview = LinkPreview(link, soup=soup)
print("title:", preview.title)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkpreview-0.11.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

linkpreview-0.11.0-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file linkpreview-0.11.0.tar.gz.

File metadata

  • Download URL: linkpreview-0.11.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for linkpreview-0.11.0.tar.gz
Algorithm Hash digest
SHA256 af30d3d1d86358d8fce9fa7bf9976f0a7ef0b213645072f58e916a87782ccbb5
MD5 19f8dbac1eabf0d14bed400a42ff08d9
BLAKE2b-256 15e07add03bd40f7f20dc5661e11e6e2137dc0a1062b01070699b420859de899

See more details on using hashes here.

File details

Details for the file linkpreview-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: linkpreview-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for linkpreview-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f4dbd9abf0cdff6a5c8ca0e4133509c02ecf531ed6ea8c9e31da7e1cc510e8e
MD5 bd3128d1ac9d37f50d52fba5c0621847
BLAKE2b-256 a14b04c4740668ee84b37a2cb7d5e38111a399407a7ac81bc1c3e7efe2950b94

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page