Skip to main content

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Project description

https://img.shields.io/travis/scrapy/parsel.svg https://img.shields.io/pypi/v/parsel.svg Coverage report

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Features

  • Extract text using CSS or XPath selectors

  • Regular expression helper methods

Example:

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').extract_first()
u'Hello, Parsel!'
>>>
>>> sel.css('h1::text').re('\w+')
[u'Hello', u'Parsel']
>>>
>>> for e in sel.css('ul > li'):
        print(e.xpath('.//a/@href').extract_first())
http://example.com
http://scrapy.org

History

1.2.0 (2017-05-17)

  • Add SelectorList.get and SelectorList.getall methods as aliases for SelectorList.extract_first and SelectorList.extract respectively

  • Add default value parameter to SelectorList.re_first method

  • Add Selector.re_first method

  • Bug fix: detect None result from lxml parsing and fallback with an empty document

  • Rearrange XML/HTML examples in the selectors usage docs

  • Travis CI:

    • Test against Python 3.6

    • Test against PyPy using “Portable PyPy for Linux” distribution

1.1.0 (2016-11-22)

  • Change default HTML parser to lxml.html.HTMLParser, which makes easier to use some HTML specific features

  • Add css2xpath function to translate CSS to XPath

  • Add support for ad-hoc namespaces declarations

  • Add support for XPath variables

  • Documentation improvements and updates

1.0.3 (2016-07-29)

  • Add BSD-3-Clause license file

  • Re-enable PyPy tests

  • Integrate py.test runs with setuptools (needed for Debian packaging)

  • Changelog is now called NEWS

1.0.2 (2016-04-26)

  • Fix bug in exception handling causing original traceback to be lost

  • Added docstrings and other doc fixes

1.0.1 (2015-08-24)

  • Updated PyPI classifiers

  • Added docstrings for csstranslator module and other doc fixes

1.0.0 (2015-08-22)

  • Documentation fixes

0.9.6 (2015-08-14)

  • Updated documentation

  • Extended test coverage

0.9.5 (2015-08-11)

  • Support for extending SelectorList

0.9.4 (2015-08-10)

  • Try workaround for travis-ci/dpl#253

0.9.3 (2015-08-07)

  • Add base_url argument

0.9.2 (2015-08-07)

  • Rename module unified -> selector and promoted root attribute

  • Add create_root_node function

0.9.1 (2015-08-04)

  • Setup Sphinx build and docs structure

  • Build universal wheels

  • Rename some leftovers from package extraction

0.9.0 (2015-07-30)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsel-1.2.0.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

parsel-1.2.0-py2.py3-none-any.whl (11.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file parsel-1.2.0.tar.gz.

File metadata

  • Download URL: parsel-1.2.0.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for parsel-1.2.0.tar.gz
Algorithm Hash digest
SHA256 1c5a31cb3f0fdca3d19f2ef966d595df69181c2d81b6c56582732c819e2f4e26
MD5 603e6e5a6263b06f307436abce52eb6d
BLAKE2b-256 5f67f56ec2c9e8e3ac5e1b6c7689eea6e1781701e972514fa344e9fc6409423e

See more details on using hashes here.

File details

Details for the file parsel-1.2.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for parsel-1.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 29eb4bd74e22e41138ea8ed96ce8e477b8116f97a13a991e39cb150fdde7eabd
MD5 6897c012979ead0e8918429fba01e519
BLAKE2b-256 d0bdc5c3cf9c490d328a1d1e5e942c3a2b84d6934d5666e9d4bcfc2f83e7dbdd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page