Fork of requests-html, powered by playwright

Project description

Requests-HTMLC: Fork of Requests-HTML, using PlayWright

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.

When using this library you automatically get:

Full JavaScript support! (Using Chromium, thanks to playwright)
CSS Selectors (a.k.a jQuery-style, thanks to PyQuery).
XPath Selectors, for the faint of heart.
Mocked user-agent (like a real web browser).
Automatic following of redirects.
Connection–pooling and cookie persistence.
The Requests experience you know and love, with magical parsing abilities.
Async Support

Other nice features include:

Markdown export of pages and elements.

Tutorial & Usage

Below is a summary of usage for this package. For detailed api documentation, see here

Make a GET request to 'python.org', using Requests:

    >>> from requests_html import HTMLSession
    >>> session = HTMLSession()
    >>> r = session.get('https://python.org/')

Try async and get some sites at the same time:

    >>> from requests_html import AsyncHTMLSession
    >>> asession = AsyncHTMLSession()
    >>> async def get_pythonorg():
    ...     r = await asession.get('https://python.org/')
    ...     return r
    ...
    >>> async def get_reddit():
    ...    r = await asession.get('https://reddit.com/')
    ...    return r
    ...
    >>> async def get_google():
    ...    r = await asession.get('https://google.com/')
    ...    return r
    ...
    >>> results = asession.run(get_pythonorg, get_reddit, get_google)
    >>> results # check the requests all returned a 200 (success) code
    [<Response [200]>, <Response [200]>, <Response [200]>]
    >>> # Each item in the results list is a response object and can be interacted with as such
    >>> for result in results: 
    ...     print(result.html.url)
    ... 
    https://www.python.org/
    https://www.google.com/
    https://www.reddit.com/

Note that the order of the objects in the results list represents the order they were returned in, not the order that the coroutines are passed to the run method, which is shown in the example by the order being different.

Grab a list of all links on the page, as–is (anchors excluded):

    >>> r.html.links
    {'//docs.python.org/3/tutorial/', '/about/apps/', 'https://github.com/python/pythondotorg/issues', '/accounts/login/', '/dev/peps/', '/about/legal/', '//docs.python.org/3/tutorial/introduction.html#lists', '/download/alternatives', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', '/download/other/', '/downloads/windows/', 'https://mail.python.org/mailman/listinfo/python-dev', '/doc/av', 'https://devguide.python.org/', '/about/success/#engineering', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', '/about/gettingstarted/', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', '/success-stories/industrial-light-magic-runs-python/', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', '/', 'http://pyfound.blogspot.com/', '/events/python-events/past/', '/downloads/release/python-2714/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://status.python.org/', '/community/workshops/', '/community/lists/', 'http://buildbot.net/', '/community/awards', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', '/psf/donations/', 'http://wiki.python.org/moin/Languages', '/dev/', '/events/python-user-group/', 'https://wiki.qt.io/PySide', '/community/sigs/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'http://planetpython.org/', '/events/python-events', '/about/help/', '/events/python-user-group/past/', '/about/success/', '/psf-landing/', '/about/apps', '/about/', 'http://www.wxpython.org/', '/events/python-user-group/665/', 'https://www.python.org/psf/codeofconduct/', '/dev/peps/peps.rss', '/downloads/source/', '/psf/sponsorship/sponsors/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://bugs.python.org/', '/community/merchandise/', 'http://tornadoweb.org', '/events/python-user-group/650/', 'http://flask.pocoo.org/', '/downloads/release/python-364/', '/events/python-user-group/660/', '/events/python-user-group/638/', '/psf/', '/doc/', 'http://blog.python.org', '/events/python-events/604/', '/about/success/#government', 'http://python.org/dev/peps/', 'https://docs.python.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', '/users/membership/', '/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', '/downloads/', '/jobs/', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', '/privacy/', 'https://pypi.python.org/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'http://www.scipy.org', '/community/forums/', '/about/success/#scientific', '/about/success/#software-development', '/shell/', '/accounts/signup/', 'http://www.facebook.com/pythonlang?fref=ts', '/community/', 'https://kivy.org/', '/about/quotes/', 'http://www.web2py.com/', '/community/logos/', '/community/diversity/', '/events/calendars/', 'https://wiki.python.org/moin/BeginnersGuide', '/success-stories/', '/doc/essays/', '/dev/core-mentorship/', 'http://ipython.org', '/events/', '//docs.python.org/3/tutorial/controlflow.html', '/about/success/#education', '/blogs/', '/community/irc/', 'http://pycon.blogspot.com/', '//jobs.python.org', 'http://www.pylonsproject.org/', 'http://www.djangoproject.com/', '/downloads/mac-osx/', '/about/success/#business', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://docs.python.org/faq/', '//docs.python.org/3/tutorial/controlflow.html#defining-functions'}

Grab a list of all links on the page, in absolute form (anchors excluded):

    >>> r.html.absolute_links
    {'https://github.com/python/pythondotorg/issues', 'https://docs.python.org/3/tutorial/', 'https://www.python.org/about/success/', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', 'https://www.python.org/dev/peps/', 'https://mail.python.org/mailman/listinfo/python-dev', 'https://www.python.org/doc/', 'https://www.python.org/', 'https://www.python.org/about/', 'https://www.python.org/events/python-events/past/', 'https://devguide.python.org/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', 'https://docs.python.org/3/tutorial/introduction.html#lists', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', 'http://pyfound.blogspot.com/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://www.python.org/events/python-events', 'https://status.python.org/', 'https://www.python.org/about/apps', 'https://www.python.org/downloads/release/python-2714/', 'https://www.python.org/psf/donations/', 'http://buildbot.net/', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', 'http://wiki.python.org/moin/Languages', 'https://docs.python.org/faq/', 'https://jobs.python.org', 'https://www.python.org/about/success/#software-development', 'https://www.python.org/about/success/#education', 'https://www.python.org/community/logos/', 'https://www.python.org/doc/av', 'https://wiki.qt.io/PySide', 'https://www.python.org/events/python-user-group/660/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'https://www.python.org/dev/peps/peps.rss', 'http://planetpython.org/', 'https://www.python.org/events/python-user-group/past/', 'https://docs.python.org/3/tutorial/controlflow.html#defining-functions', 'https://www.python.org/community/diversity/', 'https://docs.python.org/3/tutorial/controlflow.html', 'https://www.python.org/community/awards', 'https://www.python.org/events/python-user-group/638/', 'https://www.python.org/about/legal/', 'https://www.python.org/dev/', 'https://www.python.org/download/alternatives', 'https://www.python.org/downloads/', 'https://www.python.org/community/lists/', 'http://www.wxpython.org/', 'https://www.python.org/about/success/#government', 'https://www.python.org/psf/', 'https://www.python.org/psf/codeofconduct/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://www.python.org/downloads/source/', 'https://bugs.python.org/', 'https://www.python.org/downloads/mac-osx/', 'https://www.python.org/about/help/', 'http://tornadoweb.org', 'http://flask.pocoo.org/', 'https://www.python.org/users/membership/', 'http://blog.python.org', 'https://www.python.org/privacy/', 'https://www.python.org/about/gettingstarted/', 'http://python.org/dev/peps/', 'https://www.python.org/about/apps/', 'https://docs.python.org', 'https://www.python.org/success-stories/', 'https://www.python.org/community/forums/', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', 'https://www.python.org/community/merchandise/', 'https://www.python.org/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', 'https://pypi.python.org/', 'https://www.python.org/events/python-user-group/650/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'https://www.python.org/about/quotes/', 'https://www.python.org/downloads/windows/', 'https://www.python.org/events/calendars/', 'http://www.scipy.org', 'https://www.python.org/community/workshops/', 'https://www.python.org/blogs/', 'https://www.python.org/accounts/signup/', 'https://www.python.org/events/', 'https://kivy.org/', 'http://www.facebook.com/pythonlang?fref=ts', 'http://www.web2py.com/', 'https://www.python.org/psf/sponsorship/sponsors/', 'https://www.python.org/community/', 'https://www.python.org/download/other/', 'https://www.python.org/psf-landing/', 'https://www.python.org/events/python-user-group/665/', 'https://wiki.python.org/moin/BeginnersGuide', 'https://www.python.org/accounts/login/', 'https://www.python.org/downloads/release/python-364/', 'https://www.python.org/dev/core-mentorship/', 'https://www.python.org/about/success/#business', 'https://www.python.org/community/sigs/', 'https://www.python.org/events/python-user-group/', 'http://ipython.org', 'https://www.python.org/shell/', 'https://www.python.org/community/irc/', 'https://www.python.org/about/success/#engineering', 'http://www.pylonsproject.org/', 'http://pycon.blogspot.com/', 'https://www.python.org/about/success/#scientific', 'https://www.python.org/doc/essays/', 'http://www.djangoproject.com/', 'https://www.python.org/success-stories/industrial-light-magic-runs-python/', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://www.python.org/jobs/', 'https://www.python.org/events/python-events/604/'}

Select an element with a CSS Selector:

    >>> about = r.html.find('#about', first=True)

Grab an element's text contents:

    >>> print(about.text)
    About
    Applications
    Quotes
    Getting Started
    Help
    Python Brochure

Introspect an Element's attributes:

    >>> about.attrs
    {'id': 'about', 'class': ('tier-1', 'element-1'), 'aria-haspopup': 'true'}

Render out an Element's HTML:

    >>> about.html
    '<li aria-haspopup="true" class="tier-1 element-1 " id="about">\n<a class="" href="/about/" title="">About</a>\n<ul aria-hidden="true" class="subnav menu" role="menu">\n<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>\n<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>\n<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>\n<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>\n<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>\n</ul>\n</li>'

Select Elements within Elements:

    >>> about.find('a')
    [<Element 'a' href='/about/' title='' class=''>, <Element 'a' href='/about/apps/' title=''>, <Element 'a' href='/about/quotes/' title=''>, <Element 'a' href='/about/gettingstarted/' title=''>, <Element 'a' href='/about/help/' title=''>, <Element 'a' href='http://brochure.getpython.info/' title=''>]

Search for links within an element:

    >>> about.absolute_links
    {'http://brochure.getpython.info/', 'https://www.python.org/about/gettingstarted/', 'https://www.python.org/about/', 'https://www.python.org/about/quotes/', 'https://www.python.org/about/help/', 'https://www.python.org/about/apps/'}

Search for text on the page:

    >>> r.html.search('Python is a {} language')[0]
    programming

More complex CSS Selector example (copied from Chrome dev tools):

    >>> r = session.get('https://github.com/')
    >>> sel = 'body > div.application-main > div.jumbotron.jumbotron-codelines > div > div > div.col-md-7.text-center.text-md-left > p'
    >>> print(r.html.find(sel, first=True).text)
    GitHub is a development platform inspired by the way you work. From open source to business, you can host and review code, manage projects, and build software alongside millions of other developers.

XPath is also supported:

   >>> r.html.xpath('/html/body/div[1]/a')
   [<Element 'a' class=('px-2', 'py-4', 'show-on-focus', 'js-skip-to-content') href='#start-of-content' tabindex='1'>]

JavaScript Support

Let's grab some text that's rendered by JavaScript. Until 2020, the Python 2.7 countdown clock (https://pythonclock.org) will serve as a good test page:

    >>> r = session.get('https://pythonclock.org')

Let's try and see the dynamically rendered code (The countdown clock). To do that quickly at first, we'll search between the last text we see before it ('Python 2.7 will retire in...') and the first text we see after it ('Enable Guido Mode').

	>>> r.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0]
	'</h1>\n        </div>\n        <div class="python-27-clock"></div>\n        <div class="center">\n            <div class="guido-button-block">\n                <button class="js-guido-mode guido-button">'

Notice the clock is missing. The render() method takes the response and renders the dynamic content just like a web browser would.

    >>> r.html.render()
    >>> r.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0]
    '</h1>\n        </div>\n        <div class="python-27-clock is-countdown"><span class="countdown-row countdown-show6"><span class="countdown-section"><span class="countdown-amount">1</span><span class="countdown-period">Year</span></span><span class="countdown-section"><span class="countdown-amount">2</span><span class="countdown-period">Months</span></span><span class="countdown-section"><span class="countdown-amount">28</span><span class="countdown-period">Days</span></span><span class="countdown-section"><span class="countdown-amount">16</span><span class="countdown-period">Hours</span></span><span class="countdown-section"><span class="countdown-amount">52</span><span class="countdown-period">Minutes</span></span><span class="countdown-section"><span class="countdown-amount">46</span><span class="countdown-period">Seconds</span></span></span></div>\n        <div class="center">\n            <div class="guido-button-block">\n                <button class="js-guido-mode guido-button">'

Let's clean it up a bit. This step is not needed, it just makes it a bit easier to visualize the returned html to see what we need to target to extract our required information.

	>>> from pprint import pprint
	>>> pprint(r.html.search('Python 2.7 will retire in...{}Enable')[0])
	('</h1>\n'
 '        </div>\n'
 '        <div class="python-27-clock is-countdown"><span class="countdown-row '
 'countdown-show6"><span class="countdown-section"><span '
 'class="countdown-amount">1</span><span '
 'class="countdown-period">Year</span></span><span '
 'class="countdown-section"><span class="countdown-amount">2</span><span '
 'class="countdown-period">Months</span></span><span '
 'class="countdown-section"><span class="countdown-amount">28</span><span '
 'class="countdown-period">Days</span></span><span '
 'class="countdown-section"><span class="countdown-amount">16</span><span '
 'class="countdown-period">Hours</span></span><span '
 'class="countdown-section"><span class="countdown-amount">52</span><span '
 'class="countdown-period">Minutes</span></span><span '
 'class="countdown-section"><span class="countdown-amount">46</span><span '
 'class="countdown-period">Seconds</span></span></span></div>\n'
 '        <div class="center">\n'
 '            <div class="guido-button-block">\n'
 '                <button class="js-guido-mode guido-button">')

The rendered html has all the same methods and attributes as above. Let's extract just the data that we want out of the clock into something easy to use elsewhere and introspect like a dictionary.

	>>> periods = [element.text for element in r.html.find('.countdown-period')]
	>>> amounts = [element.text for element in r.html.find('.countdown-amount')]
	>>> countdown_data = dict(zip(periods, amounts))
	>>> countdown_data
	{'Year': '1', 'Months': '2', 'Days': '5', 'Hours': '23', 'Minutes': '34', 'Seconds': '37'}

Or you can do this async also:

    >>> async def get_pyclock():
    ...     r = await asession.get('https://pythonclock.org/')
    ...     await r.html.arender()
    ...     return r
    ...
    >>> results = asession.run(get_pyclock, get_pyclock, get_pyclock)

The rest of the code operates the same way as the synchronous version except that results is a list containing multiple response objects however the same basic processes can be applied as above to extract the data you want.

Using without Requests

You can also use this library without Requests:

    >>> from requests_html import HTML
    >>> doc = """<a href='https://httpbin.org'>"""
    >>> html = HTML(html=doc)
    >>> html.links
    {'https://httpbin.org'}

Installation

    $ pip install requests-htmlc
    ✨🍰✨

Only Python 3.11 and above is supported.

Ensure you have playwright installed: (the below command works because playwright is installed via pip during the Installation of requests-htmlc)

    $ playwright install

Project details

Release history Release notifications | RSS feed

This version

0.0.9

Apr 24, 2026

0.0.8

Dec 19, 2024

0.0.7

Apr 18, 2024

0.0.6

Apr 17, 2024

0.0.5

Apr 17, 2024

0.0.4

Apr 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

requests_htmlc-0.0.9.tar.gz (2.2 MB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

requests_htmlc-0.0.9-py3-none-any.whl (2.2 MB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file requests_htmlc-0.0.9.tar.gz.

File metadata

Download URL: requests_htmlc-0.0.9.tar.gz
Upload date: Apr 24, 2026
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for requests_htmlc-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`e955d449e6fbe598a24513bbbca2c79708ca7dc11d2ff73f7aad514c6cd8d9d1`
MD5	`0d109ec936735dc4d5cefbeb6baffef0`
BLAKE2b-256	`c5685386c3387870c6db587883ec5cc28f697691aff5ffe056333865ee6d0039`

See more details on using hashes here.

Provenance

The following attestation bundles were made for requests_htmlc-0.0.9.tar.gz:

Publisher: pypi-publish.yaml on cboin1996/requests-html

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: requests_htmlc-0.0.9.tar.gz
- Subject digest: e955d449e6fbe598a24513bbbca2c79708ca7dc11d2ff73f7aad514c6cd8d9d1
- Sigstore transparency entry: 1367623969
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cboin1996/requests-html@e69bf3973e1da630902ff9b00c9391f8cdc87034
- Branch / Tag: refs/tags/v0.0.9
- Owner: https://github.com/cboin1996
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yaml@e69bf3973e1da630902ff9b00c9391f8cdc87034
- Trigger Event: push

File details

Details for the file requests_htmlc-0.0.9-py3-none-any.whl.

File metadata

Download URL: requests_htmlc-0.0.9-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 2.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for requests_htmlc-0.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd4733cf3c45958cfef25e8bd7cd92d95e394ce4bbd226d574bf5ec1f356db36`
MD5	`3d4f13176081abb86bfb5395e472717b`
BLAKE2b-256	`9f8bc1445c28646f71bc43e248cf2e1a6aadfb38bb23c8d2a84c74f15144561e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for requests_htmlc-0.0.9-py3-none-any.whl:

Publisher: pypi-publish.yaml on cboin1996/requests-html

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: requests_htmlc-0.0.9-py3-none-any.whl
- Subject digest: fd4733cf3c45958cfef25e8bd7cd92d95e394ce4bbd226d574bf5ec1f356db36
- Sigstore transparency entry: 1367623985
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cboin1996/requests-html@e69bf3973e1da630902ff9b00c9391f8cdc87034
- Branch / Tag: refs/tags/v0.0.9
- Owner: https://github.com/cboin1996
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yaml@e69bf3973e1da630902ff9b00c9391f8cdc87034
- Trigger Event: push

requests-htmlc 0.0.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Requests-HTMLC: Fork of Requests-HTML, using PlayWright

Tutorial & Usage

JavaScript Support

Using without Requests

Installation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance