Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Your friendly neighborhood web scraper

Project Description

Homepage: http://pyrobot.readthedocs.org/

import re
from pyrobot import RoboBrowser

# Browse to Rap Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')

# Search for Queen
form = browser.get_form(action=re.compile(r'search'))
form['q'].value = 'queen'
browser.submit_form(form)

# Look up the first song
songs = browser.select('.song_name')
browser.follow_link(songs[0])
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text     # \n[Intro]\nIs this the real life...

# Back to results page
browser.back()

# Look up my favorite song
browser.follow_link('death on two legs')
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text     # \n[Verse 1]\nYou suck my blood like a leech...

PyRobot combines the best of two excellent Python libraries: Requests and BeautifulSoup. PyRobot represents browser sessions using Requests and HTML responses using BeautifulSoup, transparently exposing methods of both libraries:

import re
from pyrobot import RoboBrowser

browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')

# Inspect the browser session
browser.session.cookies['_gh_sess']         # BAh7Bzo...
browser.session.headers['User-Agent']       # a python robot

# Searched the parsed HTML
browser.select('div.teaser-icon')       # [<div class="teaser-icon">
                                        # <span class="mega-octicon octicon-checklist"></span>
                                        # </div>,
                                        # ...
browser.find(class_=re.compile(r'column', re.I))    # <div class="one-third column">
                                                    # <div class="teaser-icon">
                                                    # <span class="mega-octicon octicon-checklist"></span>
                                                    # ...
Release History

Release History

This version
History Node

0.1.0

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
pyrobot-0.1.0.tar.gz (10.9 kB) Copy SHA256 Checksum SHA256 Source Feb 7, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting