Skip to main content

Collects meta data from url, or html content.

Project description

Html Meta Data Parse

Code style: black pre-commit isort bandit

About

HtmlMetaDataParse, Collects metadata from URL, or HTML content.

Usage

Python Version: 3.8+

Setup

$ make .venv
$ make clean # cleans virtual environment folder

Setup virtual environment

Pre-commit

pre-commit installed automatically via .venv, used for linting best practices.

$ make pre-commit

Test

$ make test

Install

pip install html-meta-data-parse

Example

from html_meta_data_parse import HtmlMetaDataParse
html_meta_data_parse = HtmlMetaDataParse()
html_meta_data_parse.get_meta_data_by_url(https://example.com/)

import requests
res = requests.get("https://example.com/")
html_meta_data_parse.get_meta_data_by_html(res.text)


html_meta_data_parse = HtmlMetaDataParse(url="https://example.com/", proxy=<proxy_dict>)
html_meta_data_parse.get_meta_data_by_url()

Attributes

Functions

# url is required
html_meta_data_parse.get_meta_data_by_url(url)

# html_text is required
html_meta_data_parse.get_meta_data_by_html(html_text=html_text)
Override Meta Keys

HtmlMetaDataParse uses a predefined set of keys to parse meta data from html content. However it also provides an option to override meta keys of your choice.


html_meta_data_parse.get_meta_data_by_url(
  url,
  override_meta_keys
 )


html_meta_data_parse.get_meta_data_by_html(
  html_text,
  override_meta_keys,
)

#meta_keys_sample
meta_keys = {
        "author": {
            "name": [
                "author"
            ],
            "property": [
                "bt:author",
                "article:publisher",
                "dcterms.creator"
            ],
            "itemprop": [
                "author",
            ]

        },

        "title": {
            "name": [
                "title",
                "dcterms.title",
                "",
                "twitter:title"
            ],
            "property": [
                "og:title"
            ],
            "itemprop": [
                "title",
            ]
        },

        "image": {
            "name": [
                "image",
                "twitter:image",
                "thumbnail"
            ],
            "property": [
                "og:image"
            ],
            "itemprop": [
                "image",
            ]
        },

        "content": {
            "name": [
                "description",
                "twitter:description",
                "twitter:image:alt"
            ],
            "property": [
                "og:description",
                "og:image:alt"
            ],
            "itemprop": [
                "description",
            ]
        }
   }

Deploy

Increment version in setup.py

$ make deploy STAGE=testpypi # test

$ make deploy STAGE=pypi # public

Authors

  • Immanuel George - Initial work

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_meta_data_parse-0.0.3.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

html_meta_data_parse-0.0.3-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file html_meta_data_parse-0.0.3.tar.gz.

File metadata

  • Download URL: html_meta_data_parse-0.0.3.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.3.tar.gz
Algorithm Hash digest
SHA256 cb16beecc5eca223a85ea6e90a91a1f73312759e9bedd52480e7e8823124106f
MD5 79f4cff69d302438d701d1f790d5b67e
BLAKE2b-256 faa8c5deb035c57873a84e5e7037a6fd11525ad9440ab4623c7d4d60acdfa2c5

See more details on using hashes here.

File details

Details for the file html_meta_data_parse-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: html_meta_data_parse-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2d4d6d4da2850f3a3b84154ad19878277b5c0ec3998db16afb790a39783a0345
MD5 2f65297e1be3a4faf936a944845ac48a
BLAKE2b-256 f63a1fd42d0deaf6b025c1562d49641f5c8356a11ee2f1130a5ce11535572dca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page