Scrape HTML tables from a Wikipedia page into CSV format.
Project description
wiki-table-scrape
Scrape HTML tables from a Wikipedia page into CSV format.
Why?
... TODO fille this out ...
Read more about the initial project in the blog post.
Installation
As a user-level package
python3 -m pip install --user wikitablescrape
wikitablescrape --help
In a virtual environment
python3 -m venv venv
. venv/bin/activate
# From pip
pip install wikitablescrape
wikitablescrape --help
# From source
python setup.py install
wikitablescrape --help
Usage
# Find a single HTML table and write as CSV to stdout
python -m wikitablescrape --url="https://en.wikipedia.org/wiki/List_of_mountains_by_elevation" --header="8000 metres" | head -5
# "Mountain","Metres","Feet","Range","Location and Notes"
# "Mount Everest","8,848","29,029","Himalayas","Nepal/China"
# "K2","8,611","28,251","Karakoram","Pakistan/China"
# "Kangchenjunga","8,586","28,169","Himalayas","Nepal/India – Highest in India"
# "Lhotse","8,516","27,940","Himalayas","Nepal/China – Climbers ascend Lhotse Face in climbing Everest"
# Download an entire page of CSV files into a folder
python -m wikitablescrape --url="https://en.wikipedia.org/wiki/List_of_mountains_by_elevation" --output-folder="/tmp/scrape"
Testing
# Run unit tests and code coverage checks
coverage run --source wikitablescrape -m unittest discover && coverage report --fail-under=80
# (Optionally) See coverage data
coverage html && open htmlcov/index.html
Sample Articles for Scraping
Contributing
If you would like to contribute to this module, please open an issue or pull request.
More Information
If you'd like to read more about this module, please check out my blog post.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wikitablescrape-1.0.2.tar.gz
(6.9 kB
view hashes)
Built Distribution
Close
Hashes for wikitablescrape-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b158d21d6c640c7da247c604d09c88b54820a3424f969583e69abf43880955e |
|
MD5 | 53f131428088857a3fdef0ce7fdf581d |
|
BLAKE2b-256 | d022e2330078775be99eb7d42e84d96e788b9c51af45f82f3761cc6b998733b7 |