RoboBrowser: Your friendly neighborhood web scraper

RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser
can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services
that don't have APIs, RoboBrowser can help.

.. code-block:: python

import re
from robobrowser import RoboBrowser

# Browse to Genius
browser = RoboBrowser(history=True)'')

# Search for Porcupine Tree
form = browser.get_form(action='/search')
form # <RoboForm q=>
form['q'].value = 'porcupine tree'

# Look up the first song
songs ='.song_link')
lyrics ='.lyrics')
lyrics[0].text # \nHear the sound of music ...

# Back to results page

# Look up my favorite song
song_link = browser.get_link('trains')

# Can also search HTML using regex patterns
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \nTrain set and match spied under the blind...

RoboBrowser combines the best of two excellent Python libraries:
`Requests <>`_ and
`BeautifulSoup <>`_.
RoboBrowser represents browser sessions using Requests and HTML responses
using BeautifulSoup, transparently exposing methods of both libraries:

.. code-block:: python

import re
from robobrowser import RoboBrowser

browser = RoboBrowser(user_agent='a python robot')'')

# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot

# Search the parsed HTML'div.teaser-icon') # [<div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# </div>,
# ...
browser.find(class_=re.compile(r'column', re.I)) # <div class="one-third column">
# <div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# ...

You can also pass a custom `Session` instance for lower-level configuration:

.. code-block:: python

from requests import Session
from robobrowser import RoboBrowser

session = Session()
session.verify = False # Skip SSL verification
session.proxies = {'http': ''} # Set default proxies
browser = RoboBrowser(session=session)

RoboBrowser also includes tools for working with forms, inspired by
`WebTest <>`_ and `Mechanize <>`_.

.. code-block:: python

from robobrowser import RoboBrowser

browser = RoboBrowser()'')

# Get the signup form
signup_form = browser.get_form(class_='signup')
signup_form # <RoboForm user[name]=, user[email]=, ...

# Inspect its values
signup_form['authenticity_token'].value # 6d03597 ...

# Fill it out
signup_form['user[name]'].value = 'python-robot'
signup_form['user[user_password]'].value = 'secret'

# Submit the form


.. code-block:: python

from robobrowser import RoboBrowser

# Browse to a page with checkbox inputs
browser = RoboBrowser()'')

# Find the form
form = browser.get_forms()[3]
form # <RoboForm vehicle=[]>
form['vehicle'] # <robobrowser.forms.fields.Checkbox...>

# Checked values can be get and set like lists
form['vehicle'].options # [u'Bike', u'Car']
form['vehicle'].value # []
form['vehicle'].value = ['Bike']
form['vehicle'].value = ['Bike', 'Car']

# Values can also be set using input labels
form['vehicle'].labels # [u'I have a bike', u'I have a car \r\n']
form['vehicle'].value = ['I have a bike']
form['vehicle'].value # [u'Bike']

# Only values that correspond to checkbox values or labels can be set;
# this will raise a `ValueError`
form['vehicle'].value = ['Hot Dogs']

Uploading files:

.. code-block:: python

from robobrowser import RoboBrowser

# Browse to a page with an upload form
browser = RoboBrowser()'')

# Find the form
upload_form = browser.get_form()
upload_form # <RoboForm upfile=, note=>

# Choose a file to upload
upload_form['upfile'] # <robobrowser.forms.fields.FileInput...>
upload_form['upfile'].value = open('path/to/file.txt', 'r')

# Submit

By default, creating a browser instantiates a new requests `Session`.


- Python >= 2.6 or >= 3.3


MIT licensed. See the bundled `LICENSE <>`_ file for more details.

Release History

