Skip to main content

Scrape HTML tables from a Wikipedia page into CSV format.

Project description

wiki-table-scrape

Scrape HTML tables from a Wikipedia page into CSV format.

Why?

... TODO fille this out ...

Read more about the initial project in the blog post.

Installation

From PyPI using Python 3

As a user-level package

python3 -m pip install --user wikitablescrape
wikitablescrape --help

In a virtual environment

python3 -m venv venv
. venv/bin/activate

# From pip
pip install wikitablescrape
wikitablescrape --help

# From source
python setup.py install
wikitablescrape --help

Usage

# Find a single HTML table and write as CSV to stdout
python -m wikitablescrape --url="https://en.wikipedia.org/wiki/List_of_mountains_by_elevation" --header="8000 metres" | head -5
# "Mountain","Metres","Feet","Range","Location and Notes"
# "Mount Everest","8,848","29,029","Himalayas","Nepal/China"
# "K2","8,611","28,251","Karakoram","Pakistan/China"
# "Kangchenjunga","8,586","28,169","Himalayas","Nepal/India – Highest in India"
# "Lhotse","8,516","27,940","Himalayas","Nepal/China – Climbers ascend Lhotse Face in climbing Everest"

# Download an entire page of CSV files into a folder
python -m wikitablescrape --url="https://en.wikipedia.org/wiki/List_of_mountains_by_elevation" --output-folder="/tmp/scrape"

Testing

# Run unit tests and code coverage checks
coverage run --source wikitablescrape -m unittest discover && coverage report --fail-under=80

# (Optionally) See coverage data
coverage html && open htmlcov/index.html

Sample Articles for Scraping

Contributing

If you would like to contribute to this module, please open an issue or pull request.

More Information

If you'd like to read more about this module, please check out my blog post.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitablescrape-1.0.2.tar.gz (6.9 kB view hashes)

Uploaded source

Built Distribution

wikitablescrape-1.0.2-py3-none-any.whl (9.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page