Skip to main content

An easy to use web scrapping library via JS scripts

Project description

ScrapPyJS

Project Language

Project Type

PyPI project

Current Version

Stable Version

Maintained

Ask Me Anything

PRs Welcome

The ScrapPyJS class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.

Installing

pip install ScrapPyJS

How to Use

Including and Initiating

from ScrapPyJS import ScrapPyJS



# initiate ScrapPyJS

scrappy = ScrapPyJS()



# set js script

JS_SCRIPT = "return 'ScrapPy scrapping!'"

scrappy.set_script(JS_SCRIPT)



# rest of the code goes here...



# close ScrapPyJS

scrappy.end()

Simple way

  1. Use the scrap method to scrape a webpage:

    result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
    
  2. Retrieve the result of the scraping operation:

    print(result)
    

Loop through list of URLs

  1. Set up a list of target URLs

    URLS = [
    
        'https://url1.com/',
    
        'https://url2.com/homepage/',
    
        'https://url2.com/about',
    
    ]
    
  2. Use the loop_through method to scrape through the target webpages webpage:

    # The result value will be a list if save mode is on, else a JSON string
    
    result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
    
  3. Retrieve the result of the scraping operation:

    print(result)
    

Save results to a file

Activate save mode

  1. Via toggle:

    scrappy.toggle_save_mode()
    

    Here, the save mode which is set to False by Default is toggled to True. So the save file informations are default.

  2. Via set_save_info method:

    scrappy.set_save_info(save=True)
    

    Here, we directly set save mode to True leaving other infos to default.

Configure save mode

  1. Via set_save_info method:

    FILE_NAME = "output"
    
    FILE_FORMAT = "json"
    
    SAVE_LOCATION = "path/to/file/"
    
    
    
    scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)
    

Please note that you will need to have the necessary Selenium and WebDriver dependencies installed to use this code.

Documentation

The necessary informations on the ScrapPyJS class is available in .\CLASS_STRUCTURE.md

License

This code has been licensed under GNU AGPLv3 open source copyleft license.

Author

NAME: Hind Sagar Biswas

Website: coderaptors.epizy.com

Author Facebook

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ScrapPyJS-1.1.0.tar.gz (5.9 kB view hashes)

Uploaded Source

Built Distribution

ScrapPyJS-1.1.0-py3-none-any.whl (5.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page