Skip to main content

A package for offering UI tools for building scrapy queries

Project description

Requires Python 3.6+

Scrapy GUI

A simple, Qt-Webengine powered web browser with built in functionality for testing scrapy spider code.

Also includes an addon to enable a GUI for use with the scrapy shell.

Table of Contents

Installation

You can import the package from PyPi using

pip install scrapy_gui

Then you can import it to a shell using import scrapy_gui.

Standalone UI

The standlaone UI can be opened by using scrapy_gui.open_browser() from a python shell. This consists of a web browser and a set of tools to analyse its contents.

Browser Tab

Enter any url into search bar and hit return or press the Go button. When the loading animation finishes it will be ready to parse in the Tools tab.

Browser tab

Tools Tab

The tools tab contains various sections for parsing content of the page. The purpose of this tab is to make it easy to test queries and code for use in a scrapy spider.

NOTE: This will use the initial html response. If additional requests, javascript, etc alter the page later this will not be taken into account.

It will load the initial html with an additional request using the requests package. When running a query it will create a selector object using Selection from the parsel package.

Tools tab

Query Box

The query box lets you use parsel compatible CSS and XPath queries to extract data from the page.

It returns results as though selection.css/xpath('YOUR QUERY').getall() was called.

If there are no results or there is an error in the query a dialogue will pop up informing you of the issue.

Regex Box

This box lets you add a regular expression pattern to be used in addition to the previous css query.

It returns results as though selection.css/xpath('YOUR QUERY').re(r'YOUR REGEX')' was called. This means that if you use groups it will only return the content within parenthesis.

Function Box

This box lets you define additional python code that can run on the results of your query and regex. The code can be as long and complex as you want, including adding additional functions, classes, imports etc.

The only requirement is you must include a function called user_fun(results, selector) that returns a list.

Results Box

This table will list all the results, passed through the regex and function if defined.

Source Tab

This tab contains the html source that is used in the Tools tab. You can use the text box to search for specific content. All searches are not case sensitive.

Source Tab

Notes Tab

This is just a plain text box. Content in here is not saved when you exit the app.

Integration with Scrapy Shell

It is possible to integrate this tool with the scrapy shell. This will allow you to use it on responses that have been passed through your middlewares, access more complex requests and more specific selectors.

Activation

To use it in your shell import the load_selector method using:

from scrapy_gui import load_selector

Then you can write load_selector(YOUR_SELECTOR) to open a window with your selector loaded into it.

For example load_selector(response) will load your response into the UI.

When you run the code a window named Scrapy GUI will open that contains the Tools, Source and Notes tabs from the standalone window mentioned above.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-GUI-1.2.tar.gz (72.5 kB view details)

Uploaded Source

Built Distribution

scrapy_GUI-1.2-py3-none-any.whl (72.6 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-GUI-1.2.tar.gz.

File metadata

  • Download URL: scrapy-GUI-1.2.tar.gz
  • Upload date:
  • Size: 72.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.8

File hashes

Hashes for scrapy-GUI-1.2.tar.gz
Algorithm Hash digest
SHA256 9a692da3aa53fd38e32b8935fe5e65d6f74e6c28e8fce503ceafb1821741e600
MD5 53789ac23820cea54083b3982f9401ee
BLAKE2b-256 0d162b66d3c57c5c54fa93501a54100c238fa852502ae21c255c839264bce8ad

See more details on using hashes here.

File details

Details for the file scrapy_GUI-1.2-py3-none-any.whl.

File metadata

  • Download URL: scrapy_GUI-1.2-py3-none-any.whl
  • Upload date:
  • Size: 72.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.8

File hashes

Hashes for scrapy_GUI-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e546a3f477e208ec7cd2d0f1f27d77d8ac24ade7a7c671328e67c470f9df6824
MD5 8e0c50b13b3b25585f2048330dea2dbf
BLAKE2b-256 62d6042d6c1c6443bffb82b25c1b94be9a38fcd960aaaa7390fd5071c928f3f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page