Skip to main content

Fast and modern meta tags parser (og, twitter, title, description, etc) with snippet support

Project description

Meta tags parser

Test, lint, publish PyPI version Downloads Coverage Code style: black Imports: isort

Fast, modern, pure Python meta tag parser and snippet creator with full support for type annotations. The base package ships with py.typed and provides structured output. No jelly dicts — only typed structures! If you want to see what social media snippets look like, check the example:

Requirements

Install

pip install meta-tags-parser

Usage

TL;DR

  1. Parse meta tags from a source:

    from meta_tags_parser import parse_meta_tags_from_source, structs
    
    
    desired_result: structs.TagsGroup = parse_meta_tags_from_source("""... html source ...""")
    # desired_result is what you want
    
  2. Parse meta tags from a URL:

    from meta_tags_parser import parse_tags_from_url, parse_tags_from_url_async, structs
    
    
    desired_result: structs.TagsGroup = parse_tags_from_url("https://xfenix.ru")
    # and async variant
    desired_result: structs.TagsGroup = await parse_tags_from_url_async("https://xfenix.ru")
    # desired_result is what you want in both cases
    
  3. Parse a social media snippet from a source:

    from meta_tags_parser import parse_snippets_from_source, structs
    
    
    snippet_obj: structs.SnippetGroup = parse_snippets_from_source("""... html source ...""")
    # snippet_obj is what you want
    # access like snippet_obj.open_graph.title, ...
    
  4. Parse a social media snippet from a URL:

    from meta_tags_parser import parse_snippets_from_url, parse_snippets_from_url_async, structs
    
    
    snippet_obj: structs.SnippetGroup = parse_snippets_from_url("https://xfenix.ru")
    # and async variant
    snippet_obj: structs.SnippetGroup = await parse_snippets_from_url_async("https://xfenix.ru")
    # snippet_obj is what you want
    # access like snippet_obj.open_graph.title, ...
    

Huge note: the *_from_url functions are provided only for convenience and are very error-prone, so any reconnection or error handling is entirely up to you. I also avoid adding heavy dependencies to ensure robust connections, since most users don't expect that from this library. If you really need that, contact me.

Basic snippet parsing

Let's say you want to extract a snippet for Twitter from an HTML page:

from meta_tags_parser import parse_snippets_from_source, structs


my_result: structs.SnippetGroup = parse_snippets_from_source("""
    <meta property="og:card" content="summary_large_image">
    <meta property="og:url" content="https://github.com/">
    <meta property="og:title" content="Hello, my friend">
    <meta property="og:description" content="Content here, yehehe">
    <meta property="twitter:card" content="summary_large_image">
    <meta property="twitter:url" content="https://github.com/">
    <meta property="twitter:title" content="Hello, my friend">
    <meta property="twitter:description" content="Content here, yehehe">
""")

print(my_result)
# What will be printed:
"""
SnippetGroup(
    open_graph=SocialMediaSnippet(
        title='Hello, my friend',
        description='Content here, yehehe',
        image='',
        url='https://github.com/'
    ),
    twitter=SocialMediaSnippet(
        title='Hello, my friend',
        description='Content here, yehehe',
        image='',
        url='https://github.com/'
    )
)
"""
# You can access attributes like this
my_result.open_graph.title
my_result.twitter.image
# All fields are required and will always be available, even if they contain no data
# So you don't need to worry about attribute existence (though you may need to check their values)

Basic meta tag parsing

The main function is parse_meta_tags_from_source. Use it like this:

from meta_tags_parser import parse_meta_tags_from_source, structs


my_result: structs.TagsGroup = parse_meta_tags_from_source("""... html source ...""")
print(my_result)

# What will be printed:
"""
structs.TagsGroup(
    title="...",
    twitter=[
        structs.OneMetaTag(
            name="title", value="Hello",
            ...
        )
    ],
    open_graph=[
        structs.OneMetaTag(
            name="title", value="Hello",
            ...
        )
    ],
    basic=[
        structs.OneMetaTag(
            name="title", value="Hello",
            ...
        )
    ],
    other=[
        structs.OneMetaTag(
            name="article:name", value="Hello",
            ...
        )
    ]
)
"""

As you can see from this example, we don't use any jelly dicts—only structured dataclasses. Let's see another example:

from meta_tags_parser import parse_meta_tags_from_source, structs


my_result: structs.TagsGroup = parse_meta_tags_from_source("""
    <meta property="twitter:card" content="summary_large_image">
    <meta property="twitter:url" content="https://github.com/">
    <meta property="twitter:title" content="Hello, my friend">
    <meta property="twitter:description" content="Content here, yehehe">
""")

print(my_result)
# What will be printed:
"""
TagsGroup(
    title='',
    basic=[],
    open_graph=[],
    twitter=[
        OneMetaTag(name='card', value='summary_large_image'),
        OneMetaTag(name='url', value='https://github.com/'),
        OneMetaTag(name='title', value='Hello, my friend'),
        OneMetaTag(name='description', value='Content here, yehehe')
    ],
    other=[]
)
"""

for one_tag in my_result.twitter:
    if one_tag.name == "title":
        print(one_tag.value)
# What will be printed:
"""
Hello, my friend
"""

Improving speed

You can specify exactly what to parse:

from meta_tags_parser import parse_meta_tags_from_source, structs


result: structs.TagsGroup = parse_meta_tags_from_source("""... source ...""",
    what_to_parse=(WhatToParse.TITLE, WhatToParse.BASIC, WhatToParse.OPEN_GRAPH, WhatToParse.TWITTER, WhatToParse.OTHER)
)

Reducing this tuple of parsing requirements may increase overall parsing speed.

Important notes

  • Any name in a meta tag (name or property attribute) is lowercased
  • og: and twitter: prefixes are stripped from the original attributes, and the dataclass structures carry this information.
  • HTML is parsed with selectolax's LexborHTMLParser. It is fast and tolerant but does not emulate a browser, so extremely malformed markup or tags generated by JavaScript may not be handled. If the parser encounters a meta tag with property og:name, it will appear in the my_result.open_graph list
  • The page title (e.g., <title>Something</title>) is available as the string my_result.title (you'll receive Something)
  • "Standard" tags like title and description (see the full list in ./meta_tags_parser/structs.py in the BASIC_META_TAGS constant) are available as a list in my_result.basic
  • Other tags are available as a list in my_result.other, and their names are preserved, unlike the og:/twitter: behavior
  • For structured snippets, use the parse_snippets_from_source function

Changelog

See the release page at https://github.com/xfenix/meta-tags-parser/releases/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meta_tags_parser-2.0.2.tar.gz (522.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meta_tags_parser-2.0.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file meta_tags_parser-2.0.2.tar.gz.

File metadata

  • Download URL: meta_tags_parser-2.0.2.tar.gz
  • Upload date:
  • Size: 522.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for meta_tags_parser-2.0.2.tar.gz
Algorithm Hash digest
SHA256 bd7c0ee627db0a58f325d923dcb0ba99bfdec175f2e3db6879f5e90099ce5569
MD5 d35bee07fb662e47e80e9920d8ca076a
BLAKE2b-256 9d3366f7723411b61597bfcf0ba72f9471848169ffb1977b24dd9f7a704e8ddd

See more details on using hashes here.

File details

Details for the file meta_tags_parser-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: meta_tags_parser-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for meta_tags_parser-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 86909642d726c65fe9785193ffc884cf9e77b80f853e442cd22500aa9f9540f8
MD5 1f4712a196b1d90d30c76a5619e77bc9
BLAKE2b-256 09c14d88b8eff427c868b14a6ac375938ccf7b3c366501cc9247332f94e73142

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page