Skip to main content
Help us improve PyPI by participating in user testing. All experience levels needed!

Extract and modify HTML/CSS URLs, translate HTML documents <-> list data structures.

Project description

The htmldata module allows one to translate HTML documents back and forth to list data structures. This allows for programmatic reading and writing of HTML documents, with much flexibility.

Functions are also available for extracting and/or modifying all URLs present in the HTML or stylesheets of a document.

I have found this library useful for writing robots, for “wrapping” all of the URLs on websites inside my own proxy CGI script, for filtering HTML, and for doing flexible wget-like mirroring.

It keeps things as simple as possible, so it should be easy to learn.

Supports XHTML, too.

Project details

Release history Release notifications

This version
History Node


History Node


History Node


History Node


History Node


History Node


History Node


Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page