HTML scraper with templates
Project description
# weakscraper HTML scraper with templates
## Description
Most HTML pages are generated using templates. Why not use templates too for scraping HTML pages? As for a template language, let’s use HTML plus a few keywords. That way, the workflow with weakscraper is the following : * Get the source of a HTML page you want to scrap. * Using a few keywords, edit the HTML to select which information is of interest and which parts to discard. * If complicated processing is required, write additional callbacks. * Run weakscraper on the template and on the HTML.
## Pros * Observes the [rule of least power](https://en.wikipedia.org/wiki/Rule_of_least_power). A declarative language helps to focus on what to keep. How the information is scrapped is the job of the library.
## Cons
## Examples
## How it works ?
## License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for weakscraper-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6288e9dfbc5213b4ab814452b1b1f8c30dc1bba03702b26fac3680416c9850a3 |
|
MD5 | 3845eb4aaec6594ff8e360ef2ed235a4 |
|
BLAKE2b-256 | 80a5756fc5e3568bbcd8bd705b2e377a2da9ac95b0be7c405021cbdb2cfa704a |