Skip to main content

This package extracts/parses information from source HTML.

Project description

# HTML Parser

extracts/parses information from source HTML.

# construct a Pypi package

  • python3 setup.py sdist bdist_wheel

  • twine upload dist/*

# create CLI from dist (if you has .dist file)

  • python3 -m pip install /home/yaxiong/html_parsing/dist/htmlparsingbs4based-1.1.0.tar.gz

# install package and CLI

  • pip install htmlparsingbs4based

  • OR python3 -m pip install htmlparsingbs4based

# run from script

  • from htmlparsingbs4based.html_parsing.html_parser_custombs4_script import parse_single_page

  • parse_single_page(input_url=’https://bryansfuel.on.ca/about/’, path_to_crawled_files=’/home/yaxiong/data_crawled_websites/crawled_websites_first_batch’, min_length=1, prefix=””)

# run CLI (examples)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmlparsingbs4based-1.1.0.tar.gz (56.3 kB view hashes)

Uploaded Source

Built Distribution

htmlparsingbs4based-1.1.0-py3-none-any.whl (72.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page