Skip to main content
Help the Python Software Foundation raise $60,000 USD by December 31st!  Building the PSF Q4 Fundraiser

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Project description

https://img.shields.io/travis/scrapy/parsel.svg https://img.shields.io/pypi/v/parsel.svg

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Features

  • Extract text using CSS or XPath selectors
  • Regular expression helper methods

Example:

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').extract_first()
u'Hello, Parsel!'
>>>
>>> sel.css('h1::text').re('\w+')
[u'Hello', u'Parsel']
>>>
>>> for e in sel.css('ul > li'):
        print(e.xpath('.//a/@href')).extract_first()
http://example.com
http://scrapy.org

History

0.9.3 (2015-08-07)

  • Add base_url argument

0.9.2 (2015-08-07)

  • Rename module unified -> selector and promoted root attribute
  • Add create_root_node function

0.9.1 (2015-08-04)

  • Setup Sphinx build and docs structure
  • Build universal wheels
  • Rename some leftovers from package extraction

0.9.0 (2015-07-30)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for parsel, version 0.9.3
Filename, size File type Python version Upload date Hashes
Filename, size parsel-0.9.3-py2.py3-none-any.whl (8.1 kB) File type Wheel Python version 2.7 Upload date Hashes View
Filename, size parsel-0.9.3.tar.gz (27.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page