Skip to main content

Collects meta data from url, or html content.

Project description

Html Meta Data Parse

Code style: black pre-commit isort bandit

About

HtmlMetaDataParse, collects meta data from url, or html content.

Usage

Python Version: 3.8+

Setup

$ make .venv
$ make clean # cleans virtual environment folder

Setup virtual environment

Pre-commit

pre-commit installed automatically via .venv, used for linting best practices.

$ make pre-commit

test

$ make test
from html_meta_data_parse import HtmlMetaDataParse
html_meta_data_parse = HtmlMetaDataParse()
html_meta_data_parse.get_meta_data_by_url(https://example.com/)

import requests
res = requests.get("https://example.com/")
html_meta_data_parse.get_meta_data_by_html(res.text)


html_meta_data_parse = HtmlMetaDataParse(url="https://example.com/", proxy=<proxy_dict>)
html_meta_data_parse.get_meta_data_by_url()

Attributes

Functions

# url is required
html_meta_data_parse.get_meta_data_by_url(url)

# html_text is required
html_meta_data_parse.get_meta_data_by_html(html_text=html_text)
Override Meta Keys

HtmlMetaDataParse uses a predefined set of keys to parse meta data from html content. However it also provides an option to override meta keys of your choice.


html_meta_data_parse.get_meta_data_by_url(
  url,
  override_meta_keys
 )


html_meta_data_parse.get_meta_data_by_html(
  html_text,
  override_meta_keys,
)

#meta_keys_sample
meta_keys = {
        "author": {
            "name": [
                "author"
            ],
            "property": [
                "bt:author",
                "article:publisher",
                "dcterms.creator"
            ],
            "itemprop": [
                "author",
            ]

        },

        "title": {
            "name": [
                "title",
                "dcterms.title",
                "",
                "twitter:title"
            ],
            "property": [
                "og:title"
            ],
            "itemprop": [
                "title",
            ]
        },

        "image": {
            "name": [
                "image",
                "twitter:image",
                "thumbnail"
            ],
            "property": [
                "og:image"
            ],
            "itemprop": [
                "image",
            ]
        },

        "content": {
            "name": [
                "description",
                "twitter:description",
                "twitter:image:alt"
            ],
            "property": [
                "og:description",
                "og:image:alt"
            ],
            "itemprop": [
                "description",
            ]
        }
   }

Deploy

Increment version in setup.py

$ make deploy STAGE=testpypi # test

$ make deploy STAGE=pypi # public

Authors

  • Immanuel George - Initial work

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_meta_data_parse-0.0.1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

html_meta_data_parse-0.0.1-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file html_meta_data_parse-0.0.1.tar.gz.

File metadata

  • Download URL: html_meta_data_parse-0.0.1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.1.tar.gz
Algorithm Hash digest
SHA256 15e801410bc64f8de586326567e22455da02757b486d05fc8c86c09cb4db4603
MD5 550cb012559a33ffd6ef8f04ed1632fe
BLAKE2b-256 f962f33d46b0140c422c43e0aa25ba4273f7173cab1806b47ff7aef70fe09eac

See more details on using hashes here.

File details

Details for the file html_meta_data_parse-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: html_meta_data_parse-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27e33f188824403b4fd0565e23230b246f1ccc0f7a98fe8e537fd3887fcb7dd3
MD5 2b3e0fa6952744c429f763f38523210a
BLAKE2b-256 d6ecf359670428fe2db61544810106497d62e935920603282f3b0c4f05f9d232

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page