Skip to main content

Programmatic web browsing module with AJAX support for Python

Project description

Intro

Spynner is a stateful programmatic web browser module for Python. It is based upon PyQT and WebKit, so it supports Javascript, AJAX, and every other technology that !WebKit is able to handle (Flash, SVG, …). Spynner takes advantage of JQuery. a powerful Javascript library that makes the interaction with pages and event simulation really easy.

Using Spynner you would able to simulate a web browser with no GUI (though a browsing window can be opened for debugging purposes), so it may be used to implement crawlers or acceptance testing tools.

Credits

Companies

makinacom

Authors

Contributors

Dependencies

Feedback

Open an Issue to report a bug or request a new feature. Other comments and suggestions can be directly emailed to the authors.

Install

  • Throught regular easy_install / buildout:

    easy_install spynner
  • The bleeding edge version is hosted on github:

    git clone https://github.com/makinacorpus/spynner.git
    cd spynner
    python setup.py install

API

http://tokland.freehostia.com/googlecode/spynner/api/

You can generate the API locally (will create docs/api directory):

python setup.py gen_doc

Usage

A basic example:

import spynner
browser = spynner.Browser()
browser.load("http://www.wordreference.com")
browser.runjs("console.log('I can run Javascript')")
browser.runjs("console.log('I can run jQuery: ' + jQuery('a:first').attr('href'))")
browser.select("#esen")
browser.wk_fill("input[name=enit]", "hola")
browser.click("input[name=b]")
browser.wait_load()
print browser.url, browser.html
browser.close()

Sometimes you’ll want to see what is going on:

browser = spynner.Browser()
browser.debug_level = spynner.DEBUG
browser.create_webview()
browser.show()

See more examples in the repository: https://github.com/kiorky/spynner/tree/master/examples

Interact with the controls

  • See the implementation docstrings or examples !

  • You have three levels of control:

    • webkit methods which are recommended to us (wk_fill_*, wk_click_*) which are jquery based

    • classical methods (fill, click_*) which are jquery based

    • low level using QT raw events which are not that well working ATM. At least, you can move the mouse

Running Javascript

Spynner uses jQuery to make Javascript interface easier. By default, two modules are injected to every loaded page:

  • JQuery core Amongst other things, it adds the powerful JQuery selectors, which are used internally by some Spynner methods. Of course you can also use jQuery when you inject your own code into a page.

  • Simulate jQuery plugin: Makes it possible to simulate mouse and keyboard events (for now spynner uses it only in the _click_ action). Look up the library code to see which kind of events you can fire.

Note that you must use __jQuery(…)_ instead of _jQuery(…)_ or the common shortcut _$(…)_. That prevents name clashing with the jQuery library used by the page.

Cook your soup: parsing the HTML

You can parse the HTML of a webpage with your favorite parsing library BeautifulSoup, lxml ,.. Since we are already using Jquery for Javascript, it feels just natural to work with pyquery, its Python counterpart:

import spynner
import pyquery
browser = spynner.Browser()
...
d = pyquery.Pyquery(browser.html)
d.make_links_absolute(browser.get_url())
href = d("#somelink").attr("href")
browser.download(href, open("/path/outputfile", "w"))

Running Spynner without X11

CHANGELOG

1.11 (2012-08-04)

  • proper release

1.10 (2011-06-07)

  • add wk_check/_unckeck methods

1.9 (2011-05-29)

  • Rework javascript load [kiorky]

  • Some try in native events [kiorky]

  • Fix directory issue [kiorky]

  • add Samples [kiorky]

  • Fix download cookiesjar free problem [kiorky <kiorky@cryptelium.net>]

  • Allow download to be tracked for further reuse [kiorky <kiorky@cryptelium.net>]

  • Generate filenames by looking for their filename in response objects. [kiorky <kiorky@cryptelium.net>]

  • Add api methods to:

    • send raw keyboard keys

    • send qt raw mouse clicks

    • use qtwebkit native JS click element & fill values

    • some helpers to wait for content

    [kiorky]

  • Add download files tracker [kiorky]

0.0.3 (2009-08-01)

  • Click does not wait for page load

  • Use QtNetwork infrastructure to download files

  • Expose webkit objects in Browser class

  • Change jQuery to _jQuery

  • HTTP authentication

  • Callbacks for Javascript confirm and prompts

  • Properties: url, html, soup

  • Better docstrings (using epydoc)

  • Implement image snapshots

  • Implement URL filters

  • Implement cookies setting [tokland <tokland@gmail.com>]

0.0.2 (2009-07-27)

  • Use browser.html instead of browser.get_html

  • Fix setup.py to make it compatible with Win32

  • Add a URL filter mechanism (with a callback)

  • Use class-methods instead of burdening Browser.__init__

  • Instance variable to ignore SSL certificate errors

  • Start using epydoc format for API documentation

  • Add create_webview/destroy_webview for GUI debugging [tokland <tokland@gmail.com>]

0.0.1 (2009-07-25)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spynner-1.11.zip (141.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page