Skip to main content

Collects meta data from url, or html content.

Project description

Html Meta Data Parse

Code style: black pre-commit isort bandit

About

HtmlMetaDataParse, collects meta data from url, or html content.

Usage

Python Version: 3.8+

Setup

$ make .venv
$ make clean # cleans virtual environment folder

Setup virtual environment

Pre-commit

pre-commit installed automatically via .venv, used for linting best practices.

$ make pre-commit

Test

$ make test

Install

pip install html-meta-data-parse

Example

from html_meta_data_parse import HtmlMetaDataParse
html_meta_data_parse = HtmlMetaDataParse()
html_meta_data_parse.get_meta_data_by_url(https://example.com/)

import requests
res = requests.get("https://example.com/")
html_meta_data_parse.get_meta_data_by_html(res.text)


html_meta_data_parse = HtmlMetaDataParse(url="https://example.com/", proxy=<proxy_dict>)
html_meta_data_parse.get_meta_data_by_url()

Attributes

Functions

# url is required
html_meta_data_parse.get_meta_data_by_url(url)

# html_text is required
html_meta_data_parse.get_meta_data_by_html(html_text=html_text)
Override Meta Keys

HtmlMetaDataParse uses a predefined set of keys to parse meta data from html content. However it also provides an option to override meta keys of your choice.


html_meta_data_parse.get_meta_data_by_url(
  url,
  override_meta_keys
 )


html_meta_data_parse.get_meta_data_by_html(
  html_text,
  override_meta_keys,
)

#meta_keys_sample
meta_keys = {
        "author": {
            "name": [
                "author"
            ],
            "property": [
                "bt:author",
                "article:publisher",
                "dcterms.creator"
            ],
            "itemprop": [
                "author",
            ]

        },

        "title": {
            "name": [
                "title",
                "dcterms.title",
                "",
                "twitter:title"
            ],
            "property": [
                "og:title"
            ],
            "itemprop": [
                "title",
            ]
        },

        "image": {
            "name": [
                "image",
                "twitter:image",
                "thumbnail"
            ],
            "property": [
                "og:image"
            ],
            "itemprop": [
                "image",
            ]
        },

        "content": {
            "name": [
                "description",
                "twitter:description",
                "twitter:image:alt"
            ],
            "property": [
                "og:description",
                "og:image:alt"
            ],
            "itemprop": [
                "description",
            ]
        }
   }

Deploy

Increment version in setup.py

$ make deploy STAGE=testpypi # test

$ make deploy STAGE=pypi # public

Authors

  • Immanuel George - Initial work

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_meta_data_parse-0.0.2.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

html_meta_data_parse-0.0.2-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file html_meta_data_parse-0.0.2.tar.gz.

File metadata

  • Download URL: html_meta_data_parse-0.0.2.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d4115927625d4d4f799dae5b6f5aeb2666f4ef772286372ae305f414cb6a47c9
MD5 bf71c1ceb91c44511a762f33ae5b70e6
BLAKE2b-256 aaf5ec2618966e75f88c959c57469590b8cfa4a54e4a9791f8814fb4eb8f1cf6

See more details on using hashes here.

File details

Details for the file html_meta_data_parse-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: html_meta_data_parse-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for html_meta_data_parse-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ce48f6c8966a9963c24017ce5e5df5c5317816febd42266a9db30179abb90eaa
MD5 c76e36d865e132e617741fa58f2c4db4
BLAKE2b-256 a686f4c8590a753c38ed4d3e8c5b9163ea719f35e4cfac36bf36b0f6a2a4a8d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page