Skip to main content

Tool for extracting basic information from web pages

Project description

pageinfo
====

pageinfo is a simple module for extracting information from web pages. Currently, pageinfo will return the following from a url, where available:

* Canonical
* Title
* Description
* Favicon
* Twitter card data
* Facebook Open Graph data


##installation

`pip install pageinfo`

##usage

import pageinfo

pageinfo.get_meta('http://www.myurl.com')

The above code will return a dict with the available page information. Here's a sample response for `http://www.nytimes.com/pages/technology/index.html`:

{
"canonical": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/"
"twitter": {
"twitter:title": "A Gift From Steve Jobs Returns Home",
"twitter:image": "http://graphics8.nytimes.com/images/2013/11/18/technology/bits-brilliant-jobs/bits-brilliant-jobs-thumbLarge.jpg",
"twitter:description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",
"twitter:url": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/"
},

"favicon": "http://bits.blogs.nytimes.com/favicon.ico",

"facebook": {
"og:url": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/",
"og:site_name": "Bits Blog",
"og:type": "article",
"og:description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",
"og:title": "A Gift From Steve Jobs Returns Home",
"og:image": "http://graphics8.nytimes.com/images/2013/11/18/technology/bits-brilliant-jobs/bits-brilliant-jobs-videoSixteenByNine600.jpg"
},

"description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",

"title": "A Gift From Steve Jobs Returns Home - NYTimes.com"
}

Alternately, if you just need page titles and want a minimal response, use:

import pageinfo

pageinfo.get_title('http://www.myurl.com')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pageinfo, version 0.40
Filename, size File type Python version Upload date Hashes
Filename, size pageinfo-0.40.tar.gz (2.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page