Skip to main content

It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.

Project description

PyPI version Python Versions

scrape cli

It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.

It's based on the great and simple scraping tool written by Jeroen Janssens.

Installation

You can install scrape-cli using pip:

Using pipx (recommended for CLI tools)

pipx install scrape-cli

Using pip

pip install scrape-cli

Or install from source:

git clone https://github.com/aborruso/scrape-cli
cd scrape-cli
pip install -e .

Requirements

  • Python >=3.6
  • requests
  • lxml
  • cssselect

How does it work?

A CSS selector query like this

curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
| scrape -be 'table.wikitable > tbody > tr > td > b > a'

or an XPATH query like this one:

curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
| scrape -be '//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a'

gives you back:

<html>
 <head>
 </head>
 <body>
  <a href="/wiki/Afghanistan" title="Afghanistan">
   Afghanistan
  </a>
  <a href="/wiki/Albania" title="Albania">
   Albania
  </a>
  <a href="/wiki/Algeria" title="Algeria">
   Algeria
  </a>
  <a href="/wiki/Andorra" title="Andorra">
   Andorra
  </a>
  <a href="/wiki/Angola" title="Angola">
   Angola
  </a>
  <a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
   Antigua and Barbuda
  </a>
  <a href="/wiki/Argentina" title="Argentina">
   Argentina
  </a>
  <a href="/wiki/Armenia" title="Armenia">
   Armenia
  </a>
...
...
 </body>
</html>

Some notes on the commands:

  • -e to set the query
  • -b to add <html>, <head> and <body> tags to the HTML output.

Linux 64 bit precompiled binary

If you are looking for precompiled executables for Linux, please refer to the Releases page on GitHub where you can find the latest precompiled binary file.

I have built the scrape-linux-x86_64 precompiled binary, using pyinstaller and this command: pyinstaller --onefile scrape.py.

Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrape_cli-1.1.2.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrape_cli-1.1.2-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file scrape_cli-1.1.2.tar.gz.

File metadata

  • Download URL: scrape_cli-1.1.2.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for scrape_cli-1.1.2.tar.gz
Algorithm Hash digest
SHA256 536a870305192e65657d9d4014c5c7b2b31ab414dc62747995e349609222dc39
MD5 2ea9071ae6e152ffba13ce6535170e69
BLAKE2b-256 74b846aefaa76544178ed9599197d76bd90eeeb6ec5e0c724a2a4ea0181d7308

See more details on using hashes here.

File details

Details for the file scrape_cli-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: scrape_cli-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for scrape_cli-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 820147ea46a129041cc6de652702f0258c88fe44de9deacb9120c917696db06c
MD5 b86320ca84bd9112fae1ed03481162ac
BLAKE2b-256 90005ac1a4237f3910020928df1576d638d8197101edf2c1d3a9e90f1d31b7ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page