Skip to main content

Get link (URL) preview

Project description

linkpreview

Build Status Coverage Status pypi

Get link preview in python

Gathering data from:

  1. OpenGraph meta tags
  2. TwitterCard meta tags
  3. Microdata meta tags
  4. JSON-LD meta tags
  5. HTML Generic tags (h1, p, img)
  6. URL readable parts

Install

pip install linkpreview

Usage

Basic

from linkpreview import link_preview

url = "http://localhost"
content = """
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width">
    <!-- ... --->
    <title>a title</title>
  </head>
  <body>
  <!-- ... --->
  </body>
</html>
"""
preview = link_preview(url, content)
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

Automatic fetch link content

from linkpreview import link_preview

preview = link_preview("http://github.com/")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

lxml as XML parser

Very recommended for better performance.

Install the lxml and use it like this:

from linkpreview import link_preview

preview = link_preview("http://github.com/", parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

Advanced

from linkpreview import Link, LinkPreview, LinkGrabber

url = "http://github.com"
grabber = LinkGrabber(
    initial_timeout=20,
    maxsize=1048576,
    receive_timeout=10,
    chunk_size=1024,
)
content, url = grabber.get_content(url)
link = Link(url, content)
preview = LinkPreview(link, parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)

Extend default headers:

content, url = grabber.get_content(url, headers={'user-agent': 'Twitterbot'})

Ignore default headers:

content, url = grabber.get_content(
  url,
  headers={'user-agent': 'Twitterbot', 'accept': '*/*'},
  replace_headers=True,
)

Use preset headers:

content, url = grabber.get_content( url, headers='googlebot')

Available presets: firefox, chrome, googlebot, twitterbot, telegrambot, imessagebot

If you already have parsed BeautifulSoup object:

from bs4 import BeautifulSoup
from linkpreview import Link, LinkPreview

url = "http://example.com"
content = "<h1>Hello</h1>"
soup = BeautifulSoup(content, "html.parser")
link = Link(url, content)
preview = LinkPreview(link, soup=soup)
print("title:", preview.title)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkpreview-0.12.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkpreview-0.12.1-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file linkpreview-0.12.1.tar.gz.

File metadata

  • Download URL: linkpreview-0.12.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for linkpreview-0.12.1.tar.gz
Algorithm Hash digest
SHA256 4f3ec342a848d23dbe5c953daa574c4096a98167b4f8b41e57734a8e85b413e4
MD5 279408f3859afd82d843309c7cc6e5d7
BLAKE2b-256 6654e538987c8ce1e8585a276ecf018ed8eb958eb17649e893d04bde235c7ccf

See more details on using hashes here.

File details

Details for the file linkpreview-0.12.1-py3-none-any.whl.

File metadata

  • Download URL: linkpreview-0.12.1-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for linkpreview-0.12.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5343dc7475592fb7a6dd12ef13a8ea7cc4983641b43a465d3472695e6f3dd92e
MD5 223cd100f94ca594ce338747cb8f8b12
BLAKE2b-256 fdbda973d4189aa3561a6abdd4e02971cd6a50b6ddb35eb3eec58f557535b1a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page