Skip to main content

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Project description

HTMLmetadata

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Inspired in https://metascraper.js.org

Install

pip install htmlmetadata

Use

You can use it by calling the module directly.

python -m htmlmetadata http://schema.org/docs/about.html                                                                            
{
  "request": {
    "url": "http://schema.org/docs/about.html"
  },
  "summary": {
    "description": "Schema.org is a set of extensible schemas that enables webmasters to embed\n    structured data on their web pages for use by search engines and other applications.",
    "title": "about page - schema.org",
    "language": "en"
  }
}

Or use it directly in your code.

from htmlmetadata import extract_metadata

data = extract_metadata("http://schema.org/docs/about.html")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for htmlmetadata, version 1.1
Filename, size File type Python version Upload date Hashes
Filename, size htmlmetadata-1.1-py2.py3-none-any.whl (5.4 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size htmlmetadata-1.1.zip (8.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page