Skip to main content
Help us improve Python packaging – donate today!

Tool for extracting basic information from web pages

Project Description

pageinfo
====

pageinfo is a simple module for extracting information from web pages. Currently, pageinfo will return the following from a url, where available:

* Canonical
* Title
* Description
* Favicon
* Twitter card data
* Facebook Open Graph data


##installation

`pip install pageinfo`

##usage

import pageinfo

pageinfo.get_meta('http://www.myurl.com')

The above code will return a dict with the available page information. Here's a sample response for `http://www.nytimes.com/pages/technology/index.html`:

{
"canonical": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/"
"twitter": {
"twitter:title": "A Gift From Steve Jobs Returns Home",
"twitter:image": "http://graphics8.nytimes.com/images/2013/11/18/technology/bits-brilliant-jobs/bits-brilliant-jobs-thumbLarge.jpg",
"twitter:description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",
"twitter:url": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/"
},

"favicon": "http://bits.blogs.nytimes.com/favicon.ico",

"facebook": {
"og:url": "http://bits.blogs.nytimes.com/2013/11/20/a-gift-from-steve-jobs-returns-home/",
"og:site_name": "Bits Blog",
"og:type": "article",
"og:description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",
"og:title": "A Gift From Steve Jobs Returns Home",
"og:image": "http://graphics8.nytimes.com/images/2013/11/18/technology/bits-brilliant-jobs/bits-brilliant-jobs-videoSixteenByNine600.jpg"
},

"description": "An Apple II that spent the last 33 years in Katmandu, Nepal, most of it packed away in a hospital basement there, was a rare symbol of the charity of Steven P. Jobs.",

"title": "A Gift From Steve Jobs Returns Home - NYTimes.com"
}

Alternately, if you just need page titles and want a minimal response, use:

import pageinfo

pageinfo.get_title('http://www.myurl.com')

Release history Release notifications

This version
History Node

0.40

History Node

0.35

History Node

0.3

History Node

0.2

History Node

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pageinfo-0.40.tar.gz (2.6 kB) Copy SHA256 hash SHA256 Source None Jun 20, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page