Skip to main content

Collects metadata from URL, or HTML content.

Project description

Html Meta Data Parse

Code style: black pre-commit isort bandit

About

HtmlMetaDataParse, collects metadata from URL, or HTML content.

Usage

Python Version: 3.8+

Setup

$ make .venv
$ make clean # cleans virtual environment folder

Setup virtual environment

Pre-commit

pre-commit installed automatically via .venv, used for linting best practices.

$ make pre-commit

Test

$ make test

Install

pip install html-meta-data-parse

Example

from html_meta_data_parse import HtmlMetaDataParse
html_meta_data_parse = HtmlMetaDataParse()
html_meta_data_parse.get_meta_data_by_url(https://example.com/)

import requests
res = requests.get("https://example.com/")
html_meta_data_parse.get_meta_data_by_html(res.text)


html_meta_data_parse = HtmlMetaDataParse(url="https://example.com/", proxy=<proxy_dict>)
html_meta_data_parse.get_meta_data_by_url()

Attributes

Functions

# url is required
html_meta_data_parse.get_meta_data_by_url(url)

# html_text is required
html_meta_data_parse.get_meta_data_by_html(html_text=html_text)
Override Meta Keys

HtmlMetaDataParse uses a predefined set of keys to parse meta data from html content. However it also provides an option to override meta keys of your choice.


html_meta_data_parse.get_meta_data_by_url(
  url,
  override_meta_keys
 )


html_meta_data_parse.get_meta_data_by_html(
  html_text,
  override_meta_keys,
)

#meta_keys_sample
meta_keys = {
        "author": {
            "name": [
                "author"
            ],
            "property": [
                "bt:author",
                "article:publisher",
                "dcterms.creator"
            ],
            "itemprop": [
                "author",
            ]

        },

        "title": {
            "name": [
                "title",
                "dcterms.title",
                "",
                "twitter:title"
            ],
            "property": [
                "og:title"
            ],
            "itemprop": [
                "title",
            ]
        },

        "image": {
            "name": [
                "image",
                "twitter:image",
                "thumbnail"
            ],
            "property": [
                "og:image"
            ],
            "itemprop": [
                "image",
            ]
        },

        "content": {
            "name": [
                "description",
                "twitter:description",
                "twitter:image:alt"
            ],
            "property": [
                "og:description",
                "og:image:alt"
            ],
            "itemprop": [
                "description",
            ]
        }
   }

Deploy

Increment version in setup.py

$ make deploy STAGE=testpypi # test

$ make deploy STAGE=pypi # public

Authors

  • Immanuel George - Initial work

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_meta_data_parse-0.0.31.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

html_meta_data_parse-0.0.31-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file html_meta_data_parse-0.0.31.tar.gz.

File metadata

  • Download URL: html_meta_data_parse-0.0.31.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.31.tar.gz
Algorithm Hash digest
SHA256 a491b88e744efd614ccbbe851ba347d6637a2e288f00879baeac50d4fffdfc09
MD5 ba0f878282e4c43e3048c17d21a91932
BLAKE2b-256 509732baacbf65de88b0c8e902a886d981e2e6e8acf5b099265bdc3067cc5780

See more details on using hashes here.

File details

Details for the file html_meta_data_parse-0.0.31-py3-none-any.whl.

File metadata

  • Download URL: html_meta_data_parse-0.0.31-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.31-py3-none-any.whl
Algorithm Hash digest
SHA256 18b83b3ec70dc9643a7d75022c095a910a87b3808a01faaf809df30f3d9615de
MD5 925aeaa3e237acf148bc2a5b7a82589a
BLAKE2b-256 a9ed88fdaeaf4ca1154ffd899cc250777e0880c7696dd1257d7e9c95155a1a77

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page