Skip to main content

Tool for extracting basic information from web pages

Project description

pageinfo
====

pageinfo is a simple module for extracting information from web pages. Currently, pageinfo will return the following from a url, where available:

* Canonical
* Title
* Description
* Favicon
* Twitter card data
* Facebook Open Graph data


##installation

`pip install pageinfo`

##usage

import pageinfo

pageinfo.get_meta('http://www.myurl.com')

The above code will return a dict with the available page information. Here's a sample response for `http://www.nytimes.com/pages/technology/index.html`:

{
"canonical": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/"
"twitter": {
"twitter:title": "A Gift From Steve Jobs Returns Home",
"twitter:image": "http://graphics8.nytimes.com/images/2013/11/18/technology/bits-brilliant-jobs/bits-brilliant-jobs-thumbLarge.jpg",
"twitter:description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",
"twitter:url": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/"
},

"favicon": "http://bits.blogs.nytimes.com/favicon.ico",

"facebook": {
"og:url": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/",
"og:site_name": "Bits Blog",
"og:type": "article",
"og:description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",
"og:title": "A Gift From Steve Jobs Returns Home",
"og:image": "http://graphics8.nytimes.com/images/2013/11/18/technology/bits-brilliant-jobs/bits-brilliant-jobs-videoSixteenByNine600.jpg"
},

"description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",

"title": "A Gift From Steve Jobs Returns Home - NYTimes.com"
}

Alternately, if you just need page titles and want a minimal response, use:

import pageinfo

pageinfo.get_title('http://www.myurl.com')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pageinfo-0.40.tar.gz (2.6 kB view details)

Uploaded Source

File details

Details for the file pageinfo-0.40.tar.gz.

File metadata

  • Download URL: pageinfo-0.40.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pageinfo-0.40.tar.gz
Algorithm Hash digest
SHA256 0499ebf8fd9691dac78926672cd2e6982d2275164b8a8b372121c92df7b74631
MD5 1538be96ad18886513aa0d08af9d0ef4
BLAKE2b-256 f425703dcbd04a46e24d5883013ef9d87433f2593106db8f98d440d2dca7007f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page