Skip to main content

Fast and modern meta tags parser (og, twitter, title, description, etc) with snippet support

Project description

Meta tags parser

Test, lint, publish PyPI version Downloads codecov Code style: black Imports: isort

Fast, modern, pure python meta tags parser and snippet creator with full support of type annotations, py.typed in basic package and structured output. No jelly dicts, only typed structures!
If you want to see what exactly is social media snippets, look at the example:

Requirements

Install

pip install meta-tags-parser

Usage

TL:DR

  1. Parse meta tags from source:
    from meta_tags_parser import parse_meta_tags_from_source, structs
    
    
    desired_result: structs.TagsGroup = parse_meta_tags_from_source("""... html source ...""")
    # desired_result — is what you want
    
  2. Parse meta tags from url:
    from meta_tags_parser import parse_tags_from_url, parse_tags_from_url_async, structs
    
    
    desired_result: structs.TagsGroup = parse_tags_from_url("https://xfenix.ru")
    # and async variant
    desired_result: structs.TagsGroup = await parse_tags_from_url_async("https://xfenix.ru")
    # desired_result — is what you want for both cases
    
  3. Parse social media snippet from source:
    from meta_tags_parser import parse_snippets_from_source, structs
    
    
    snippet_obj: structs.SnippetGroup = parse_snippets_from_source("""... html source ...""")
    # snippet_obj — is what you want
    # access like snippet_obj.open_graph.title, ...
    
  4. Parse social media snippet from url:
    from meta_tags_parser import parse_snippets_from_url, parse_snippets_from_url_async, structs
    
    
    snippet_obj: structs.SnippetGroup = parse_snippets_from_url("https://xfenix.ru")
    # and async variant
    snippet_obj: structs.SnippetGroup = await parse_snippets_from_url_async("https://xfenix.ru")
    # snippet_obj — is what you want
    # access like snippet_obj.open_graph.title, ...
    

Huge note: functions *_from_url written only for convenience and very error-prone, so any reconnections/error handling — completely on your side.
Also, I don't want to add some bloated requirements to achieve robust connections for any users, because they may simply not await any of this from the library. But if you really need this — write me.

Basic snippets parsing

Lets say you want extract snippet for twitter from html page:

from meta_tags_parser import parse_snippets_from_source, structs


my_result: structs.TagsGroup = parse_snippets_from_source("""
    <meta property="og:card" content="summary_large_image">
    <meta property="og:url" content="https://github.com/">
    <meta property="og:title" content="Hello, my friend">
    <meta property="og:description" content="Content here, yehehe">
    <meta property="twitter:card" content="summary_large_image">
    <meta property="twitter:url" content="https://github.com/">
    <meta property="twitter:title" content="Hello, my friend">
    <meta property="twitter:description" content="Content here, yehehe">
""")

print(my_result)
# What will be printed:
"""
SnippetGroup(
    open_graph=SocialMediaSnippet(
        title='Hello, my friend',
        description='Content here, yehehe',
        image='',
        url='https://github.com/'
    ),
    twitter=SocialMediaSnippet(
        title='Hello, my friend',
        description='Content here, yehehe',
        image='',
        url='https://github.com/'
    )
)
"""
# You can access attributes as this
my_result.open_graph.title
my_result.twitter.image
# All fields are necessary and will be always available, even if they have not contain data
# So no need to worry about attributes exsitence (but you may need to check values)

Basic meta tags parsing

Main function is parse_meta_tags_from_source. It can be used like this:

from meta_tags_parser import parse_meta_tags_from_source, structs


my_result: structs.TagsGroup = parse_meta_tags_from_source("""... html source ...""")
print(my_result)

# What will be printed:
"""
structs.TagsGroup(
    title="...",
    twitter=[
        structs.OneMetaTag(
            name="title", value="Hello",
            ...
        )
    ],
    open_graph=[
        structs.OneMetaTag(
            name="title", value="Hello",
            ...
        )
    ],
    basic=[
        structs.OneMetaTag(
            name="title", value="Hello",
            ...
        )
    ],
    other=[
        structs.OneMetaTag(
            name="article:name", value="Hello",
            ...
        )
    ]
)
"""

As you can see from this example, we are not using any jelly dicts, only structured dataclasses. Lets see another example:

from meta_tags_parser import parse_meta_tags_from_source, structs


my_result: structs.TagsGroup = parse_meta_tags_from_source("""
    <meta property="twitter:card" content="summary_large_image">
    <meta property="twitter:url" content="https://github.com/">
    <meta property="twitter:title" content="Hello, my friend">
    <meta property="twitter:description" content="Content here, yehehe">
""")

print(my_result)
# What will be printed:
"""
TagsGroup(
    title='',
    basic=[],
    open_graph=[],
    twitter=[
        OneMetaTag(name='card', value='summary_large_image'),
        OneMetaTag(name='url', value='https://github.com/'),
        OneMetaTag(name='title', value='Hello, my friend'),
        OneMetaTag(name='description', value='Content here, yehehe')
    ],
    other=[]
)
"""

for one_tag in my_result.twitter:
    if one_tag.name == "title":
        print(one_tag.value)
# What will be printed:
"""
Hello, my friend
"""

If you want to improve speed

You can specify what you want to parse:

from meta_tags_parser import parse_meta_tags_from_source, structs


result: structs.TagsGroup = parse_meta_tags_from_source("""... source ...""",
    what_to_parse=(WhatToParse.TITLE, WhatToParse.BASIC, WhatToParse.OPEN_GRAPH, WhatToParse.TWITTER, WhatToParse.OTHER)
)

If you reduce this tuple of parsing requirements it may increase overall parsing speed.

Important notes

  • Any name in meta tag (name or property attribute) will be lowercased
  • I decided to strip og: and twitter: from original attributes, and let dataclass structures carry this information. If parser met meta tag with property og:name, it will be available in my_result variable as one element of list my_result.open_graph
  • Title of page (e.g. <title>Something</title>) will be available as string my_result.title (of course, you recieve Something)
  • «Standart» tags like title, description (check full list here ./meta_tags_parser/structs.py in constant BASIC_META_TAGS) will be available as list in my_result.basic
  • Other tags will be available as list in my_result.other attribute, name of tags will be preserved, unlike og:/twitter: behaviour
  • If you want structured snippets, use parse_snippets_from_source function

Changelog

You can check https://github.com/xfenix/meta-tags-parser/releases/ release page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meta_tags_parser-1.3.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

meta_tags_parser-1.3.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file meta_tags_parser-1.3.0.tar.gz.

File metadata

  • Download URL: meta_tags_parser-1.3.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.9.16 Linux/5.15.0-1037-azure

File hashes

Hashes for meta_tags_parser-1.3.0.tar.gz
Algorithm Hash digest
SHA256 791f9e4d6ade3a9c197eb21caf28d0653a5109a72725639d9caed919ddbc63e7
MD5 4d093d89eb8f5cb597133febd5fc2cb8
BLAKE2b-256 be9b47e839a864e88d1e99bfcd097cc35971318ad40492237415df9f95aed518

See more details on using hashes here.

File details

Details for the file meta_tags_parser-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: meta_tags_parser-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.9.16 Linux/5.15.0-1037-azure

File hashes

Hashes for meta_tags_parser-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ce39c39572ed865714c282c7724f68ecffc54e9ffb13ac131378c3746b9c18c
MD5 98ab2da49c3bc9d5b9e380ed8429b68b
BLAKE2b-256 5f85ae99eaa5486d0bc5d8c6a6fd248fba42e511aac9a265080d0635bc05d99b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page