Skip to main content

A jquery-like library for python

Project description

pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.

This is not (or at least not yet) a library to produce or interact with javascript code. I just liked the jquery API and I missed it in python so I told myself “Hey let’s make jquery in python”. This is the result.

It can be used for many purposes, one idea that I might try in the future is to use it for templating with pure http templates that you modify using pyquery. I can also be used for web scrapping or for theming applications with Deliverance.

The project is being actively developped on a mercurial repository on Bitbucket. I have the policy of giving push access to anyone who wants it and then to review what he does. So if you want to contribute just email me.

The Sphinx documentation is available on pyquery.org.

Usage

You can use the PyQuery class to load an xml document from a string, a lxml document, from a file or from an url:

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url='http://google.com/')
>>> d = pq(filename=path_to_html_file)

Now d is like the $ in jquery:

>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> p.html()
'Hello world !'
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> p.html()
'you know <a href="http://python.org/">Python</a> rocks'
>>> p.text()
'you know Python rocks'

You can use some of the pseudo classes that are available in jQuery but that are not standard in css such as :first :last :even :odd :eq :lt :gt :checked :selected :file:

>>> d('p:first')
[<p#hello.hello>]

Attributes

You can play with the attributes with the jquery API:

>>> p.attr("id")
'hello'
>>> p.attr("id", "plop")
[<p#plop.hello>]
>>> p.attr("id", "hello")
[<p#hello.hello>]

Or in a more pythonic way:

>>> p.attr.id = "plop"
>>> p.attr.id
'plop'
>>> p.attr["id"] = "ola"
>>> p.attr["id"]
'ola'
>>> p.attr(id='hello', class_='hello2')
[<p#hello.hello2>]
>>> p.attr.class_
'hello2'
>>> p.attr.class_ = 'hello'

CSS

You can also play with css classes:

>>> p.addClass("toto")
[<p#hello.toto.hello>]
>>> p.toggleClass("titi toto")
[<p#hello.titi.hello>]
>>> p.removeClass("titi")
[<p#hello.hello>]

Or the css style:

>>> p.css("font-size", "15px")
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 15px'
>>> p.css({"font-size": "17px"})
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 17px'

Same thing the pythonic way (‘_’ characters are translated to ‘-‘):

>>> p.css.font_size = "16px"
>>> p.attr.style
'font-size: 16px'
>>> p.css['font-size'] = "15px"
>>> p.attr.style
'font-size: 15px'
>>> p.css(font_size="16px")
[<p#hello.hello>]
>>> p.attr.style
'font-size: 16px'
>>> p.css = {"font-size": "17px"}
>>> p.attr.style
'font-size: 17px'

Traversing

Some jQuery traversal methods are supported. Here are a few examples.

You can filter the selection list using a string selector:

>>> d('p').filter('.hello')
[<p#hello.hello>]

It is possible to select a single element with eq:

>>> d('p').eq(0)
[<p#hello.hello>]

You can find nested elements:

>>> d('p').find('a')
[<a>, <a>]
>>> d('p').eq(1).find('a')
[<a>]

Breaking out of a level of traversal is also supported using end:

>>> d('p').find('a').end()
[<p#hello.hello>, <p#test>]
>>> d('p').eq(0).end()
[<p#hello.hello>, <p#test>]
>>> d('p').filter(lambda i: i == 1).end()
[<p#hello.hello>, <p#test>]

Manipulating

You can also add content to the end of tags:

>>> d('p').append('check out <a href="http://reddit.com/r/python"><span>reddit</span></a>')
[<p#hello.hello>, <p#test>]
>>> print d
<html>
...
<p class="hello" id="hello" style="font-size: 17px">you know <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p><p id="test">
hello <a href="http://python.org">python</a> !
check out <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p>
...

Or to the beginning:

>>> p.prepend('check out <a href="http://reddit.com/r/python">reddit</a>')
[<p#hello.hello>]
>>> p.html()
'check out <a href="http://reddit.com/r/python">reddit</a>you know ...'

Prepend or append an element into an other:

>>> p.prependTo(d('#test'))
[<p#hello.hello>]
>>> d('#test').html()
'<p class="hello" ...</p>...hello...python...'

Insert an element after another:

>>> p.insertAfter(d('#test'))
[<p#hello.hello>]
>>> d('#test').html()
'<a href="http://python.org">python</a> !...'

Or before:

>>> p.insertBefore(d('#test'))
[<p#hello.hello>]
>>> d('body').html()
'\n<p class="hello" id="hello" style="font-size: 17px">...'

Doing something for each elements:

>>> p.each(lambda e: e.addClass('hello2'))
[<p#hello.hello2.hello>]

Remove an element:

>>> d.remove('p#id')
[<html>]
>>> d('p#id')
[]

Replace an element by another:

>>> p.replaceWith('<p>testing</p>')
[<p#hello.hello2.hello>]
>>> d('p')
[<p>, <p#test>]

Or the other way around:

>>> d('<h1>arya stark</h1>').replaceAll('p')
[<h1>]
>>> d('p')
[]
>>> d('h1')
[<h1>, <h1>]

Remove what’s inside the selection:

>>> d('h1').empty()
[<h1>, <h1>]

And you can get back the modified html:

>>> print d
<html>
<body>
<h1/><h1/></body>
</html>

You can generate html stuff:

>>> from pyquery import PyQuery as pq
>>> print pq('<div>Yeah !</div>').addClass('myclass') + pq('<b>cool</b>')
<div class="myclass">Yeah !</div><b>cool</b>

AJAX

You can query some wsgi app if WebOb is installed (it’s not a pyquery dependencie). IN this example the test app returns a simple input at / and a submit button at /submit:

>>> d = pq('<form></form>', app=input_app)
>>> d.append(d.get('/'))
[<form>]
>>> print d
<form><input name="youyou" type="text" value=""/></form>

The app is also available in new nodes:

>>> d.get('/').app is d.app is d('form').app
True

You can also request another path:

>>> d.append(d.get('/submit'))
[<form>]
>>> print d
<form><input name="youyou" type="text" value=""/><input type="submit" value="OK"/></form>

If Paste is installed, you are able to get url directly with a Proxy app:

>>> a = d.get('http://pyquery.org/')
>>> a
[<html>]

You can retrieve the app response:

>>> print a.response.status
200 OK

The response attribute is a WebOb Response

Using different parsers

By default pyquery uses the lxml xml parser and then if it doesn’t work goes on to try the html parser from lxml.html. The xml parser can sometimes be problematic when parsing xhtml pages because the parser will not raise an error but give an unusable tree (on w3c.org for example).

You can also choose which parser to use explicitly:

>>> pq('<html><body><p>toto</p></body></html>', parser='xml')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments')
[<p>]

The html and html_fragments parser are the ones from lxml.html.

Testing

If you want to run the tests that you can see above you should do:

$ hg clone https://bitbucket.org/olauzanne/pyquery/
$ cd pyquery
$ python bootstrap.py
$ bin/buildout
$ bin/test

You can build the Sphinx documentation by doing:

$ cd docs
$ make html

If you don’t already have lxml installed use this line:

$ STATIC_DEPS=true bin/buildout

More documentation

First there is the Sphinx documentation here. Then for more documentation about the API you can use the jquery website. The reference I’m now using for the API is … the color cheat sheet. Then you can always look at the code.

TODO

  • SELECTORS: still missing some jQuery pseudo classes (:radio, :password, …)

  • ATTRIBUTES: done

  • CSS: done

  • HTML: done

  • MANIPULATING: missing the wrapAll and wrapInner methods

  • TRAVERSING: about half done

  • EVENTS: nothing to do with server side might be used later for automatic ajax

  • CORE UI EFFECTS: did hide and show the rest doesn’t really makes sense on server side

  • AJAX: some with wsgi app

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyquery-0.3.1.tar.gz (17.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page