CLI tool for stripping hidden form values from an HTML document
Project description
strip-hidden-form-values
CLI tool for stripping hidden form values from an HTML document
Why would you need this? Imagine you're running a Git scraper against a website that includes hidden form fields (such as those produced by __VIEWSTATE fields) that change on every request. You can pipe the HTML through this tool to strip those hidden form values such that a change is only recorded if the rest of the page is modified in some way.
scrape-ca-wildlife-rules is an example of a repository that uses this tool for that, see the scrape.yml workflow there for details.
Installation
Install this tool using pip:
$ pip install strip-hidden-form-values
Usage
You can pipe HTML into this tool:
curl http://... | strip-hidden-form-values > output.html
Or pass it a filename:
strip-hidden-form-values input.html > output.html
The tool will replace the value= attribute of any hidden form fields with a blank string,
so the following:
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="p8nVm4PgVPA" />
Will be replaced with:
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="" />
All other HTML will remain unchanged.
Development
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd strip-hidden-form-values
python -m venv venv
source venv/bin/activate
Or if you are using pipenv:
pipenv shell
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters