Skip to main content

custom rendering of beautifulsoup object in ipython notebook and qtconsole

Project description

IPython-BeautifulSoup

IPython-BeautifulSoup is an IPython extension for displaying BeautifulSoup HTML/XML objects as prettified and syntax highlighted HTML blocks in IPython notebook and qtconsole.

Syntax highlighting is accomplished with Pygments.

teaser

Install

Simply run:

pip install "ipython-beautifulSoup[bs4]"

For BeautifulSoup 3 instead of BeautifulSoup 4, change bs4 to bs3.

Installing IPython Notebook

See http://ipython.org/ipython-doc/stable/install/index.html

To install IPython notebook or qtconsole as well, append notebook and/or qtconsole to the extras specifier after “bs4” separated by a “,”, like this:

pip install "ipython-beautifulSoup[bs4,notebook,qtconsole]"

On Ubuntu LTS, if you want to install IPython notebook, you’ll need to do this before:

sudo apt-get install python-dev g++

For the qtconsole do this (if you do this in a virtualenv) (WARNING: it’s slow):

sudo apt-get install make cmake qt4-qmake libqt4-dev
pip install pyside

Usage

In IPython notebook or qtconsole, do:

%load_ext soup

This will push a series of callables into your current context, as well as a monkey-patched BeautifulSoup and requests.

You can now use BeautifulSoup like you would if it was imported from the corresponding module.

There is great chances that you’ll want to configure the output by using configure_ipython_beautifulsoup, for example like this (just after the %load_ext):

configure_ipython_beautifulsoup(show_html=True, show_css=True, show_js=False)

To see configure_ipython_beautifulsoup documentation just do (in any interface of IPython):

configure_ipython_beautifulsoup?

This also loads a shortcut function called p (for p arse) defined as follows:

def p(url):
    if requests is not None:
        return BeautifulSoup(requests.get(url).contents)
    return BeautifulSoup(urlopen(url).read())

A note on security

Warning

By nature of including external HTML, JS, and CSS, this extension is inherently unsafe if you choose to render the html by setting show_html to True when calling configure_ipython_beautifulsoup.

By default, <script> and <link> and <style> tags are removed but this isn’t a 100% guaranty that this is secure if you choose to render the html, use at your own risks.

The most safe option is to set all options of configure_ipython_beautifulsoup to False (the default).

Screenshots

IPython Notebook

.find:

1

.findAll:

2

Contributors

In chronological order:

Don’t hesitate to add yourself.

Project details


Release history Release notifications

History Node

0.3

This version
History Node

0.2

History Node

0.1.1

History Node

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ipython_beautifulsoup-0.2-py2.7.egg (4.0 kB) Copy SHA256 hash SHA256 Egg 2.7 Jan 6, 2014
ipython_beautifulsoup-0.2-py27-none-any.whl (6.4 kB) Copy SHA256 hash SHA256 Wheel 2.7 Jan 6, 2014
ipython-beautifulsoup-0.2.tar.gz (4.1 kB) Copy SHA256 hash SHA256 Source None Jan 6, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page