Collects metadata from URL, or HTML content.
Project description
Html Meta Data Parse
About
HtmlMetaDataParse, collects metadata from URL, or HTML content.
Usage
Python Version: 3.8+
Setup
$ make .venv
$ make clean # cleans virtual environment folder
Setup virtual environment
Pre-commit
pre-commit installed automatically via .venv, used for linting best practices.
$ make pre-commit
Test
$ make test
Install
pip install html-meta-data-parse
Example
from html_meta_data_parse import HtmlMetaDataParse
html_meta_data_parse = HtmlMetaDataParse()
html_meta_data_parse.get_meta_data_by_url(https://example.com/)
import requests
res = requests.get("https://example.com/")
html_meta_data_parse.get_meta_data_by_html(res.text)
html_meta_data_parse = HtmlMetaDataParse(url="https://example.com/", proxy=<proxy_dict>)
html_meta_data_parse.get_meta_data_by_url()
Attributes
- url
- html_text
- override_meta_keys
- proxy (http://2.python-requests.org/en/master/user/advanced/?highlight=proxies#proxies)
Functions
# url is required
html_meta_data_parse.get_meta_data_by_url(url)
# html_text is required
html_meta_data_parse.get_meta_data_by_html(html_text=html_text)
Override Meta Keys
HtmlMetaDataParse uses a predefined set of keys to parse meta data from html content. However it also provides an option to override meta keys of your choice.
html_meta_data_parse.get_meta_data_by_url(
url,
override_meta_keys
)
html_meta_data_parse.get_meta_data_by_html(
html_text,
override_meta_keys,
)
#meta_keys_sample
meta_keys = {
"author": {
"name": [
"author"
],
"property": [
"bt:author",
"article:publisher",
"dcterms.creator"
],
"itemprop": [
"author",
]
},
"title": {
"name": [
"title",
"dcterms.title",
"",
"twitter:title"
],
"property": [
"og:title"
],
"itemprop": [
"title",
]
},
"image": {
"name": [
"image",
"twitter:image",
"thumbnail"
],
"property": [
"og:image"
],
"itemprop": [
"image",
]
},
"content": {
"name": [
"description",
"twitter:description",
"twitter:image:alt"
],
"property": [
"og:description",
"og:image:alt"
],
"itemprop": [
"description",
]
}
}
Deploy
Increment version in setup.py
$ make deploy STAGE=testpypi # test
$ make deploy STAGE=pypi # public
Authors
- Immanuel George - Initial work
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file html_meta_data_parse-0.0.31.tar.gz
.
File metadata
- Download URL: html_meta_data_parse-0.0.31.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a491b88e744efd614ccbbe851ba347d6637a2e288f00879baeac50d4fffdfc09 |
|
MD5 | ba0f878282e4c43e3048c17d21a91932 |
|
BLAKE2b-256 | 509732baacbf65de88b0c8e902a886d981e2e6e8acf5b099265bdc3067cc5780 |
File details
Details for the file html_meta_data_parse-0.0.31-py3-none-any.whl
.
File metadata
- Download URL: html_meta_data_parse-0.0.31-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18b83b3ec70dc9643a7d75022c095a910a87b3808a01faaf809df30f3d9615de |
|
MD5 | 925aeaa3e237acf148bc2a5b7a82589a |
|
BLAKE2b-256 | a9ed88fdaeaf4ca1154ffd899cc250777e0880c7696dd1257d7e9c95155a1a77 |