Skip to main content

An easy to use web scrapping library via JS scripts

Project description

ScrapPyJS

Project Language

Project Type

PyPI project

Current Version

Stable Version

Maintained

Ask Me Anything

PRs Welcome

The ScrapPyJS class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.

Installing

pip install ScrapPyJS

How to Use

Including and Initiating

from ScrapPyJS import ScrapPyJS



# initiate ScrapPyJS

scrappy = ScrapPyJS()



# set js script

JS_SCRIPT = "return 'ScrapPy scrapping!'"

scrappy.set_script(JS_SCRIPT)



# rest of the code goes here...



# close ScrapPyJS

scrappy.end()

Simple way

  1. Use the scrap method to scrape a webpage:

    result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
    
  2. Retrieve the result of the scraping operation:

    print(result)
    

Loop through list of URLs

  1. Set up a list of target URLs

    URLS = [
    
        'https://url1.com/',
    
        'https://url2.com/homepage/',
    
        'https://url2.com/about',
    
    ]
    
  2. Use the loop_through method to scrape through the target webpages webpage:

    # The result value will be a list if save mode is on, else a JSON string
    
    result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
    
  3. Retrieve the result of the scraping operation:

    print(result)
    

Save results to a file

Activate save mode

  1. Via toggle:

    scrappy.toggle_save_mode()
    

    Here, the save mode which is set to False by Default is toggled to True. So the save file informations are default.

  2. Via set_save_info method:

    scrappy.set_save_info(save=True)
    

    Here, we directly set save mode to True leaving other infos to default.

Configure save mode

  1. Via set_save_info method:

    FILE_NAME = "output"
    
    FILE_FORMAT = "json"
    
    SAVE_LOCATION = "path/to/file/"
    
    
    
    scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)
    

Please note that you will need to have the necessary Selenium and WebDriver dependencies installed to use this code.

Documentation

The necessary informations on the ScrapPyJS class is available in .\CLASS_STRUCTURE.md

License

This code has been licensed under GNU AGPLv3 open source copyleft license.

Author

NAME: Hind Sagar Biswas

Website: coderaptors.epizy.com

Author Facebook

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ScrapPyJS-1.1.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ScrapPyJS-1.1.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file ScrapPyJS-1.1.0.tar.gz.

File metadata

  • Download URL: ScrapPyJS-1.1.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for ScrapPyJS-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f8fa767d2771b0b406b60b81571a304a4b179bf3347c40779f3de1a32f6f110f
MD5 ddf4d6caffa9559f08e1886f8040c109
BLAKE2b-256 70f43d84e2cf6aa25ce88e7ee6f7756801aa2f4d75a5bc9afffc4127eefaf62f

See more details on using hashes here.

File details

Details for the file ScrapPyJS-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: ScrapPyJS-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for ScrapPyJS-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4770bd9985be81327fd3092385f9cb6df2762125eb2cb6f6115580d4a35e18dd
MD5 2c5e1469b30abac80263b97c9f0f8776
BLAKE2b-256 3b95d32c12e9ef88d2baa5a11e8359eaa3ea4b8cce132e1fe21364e3e88126fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page