Skip to main content

A Cli tool for Grepsr Developers

Project description

A Cli tool for Grepsr Developers

Installation

$ pip install grepsr-cli

Usage

Using help

gcli --help

Using help for a specific command

gcli create --help

Creating crawler

gcli crawler create --init

This will take you to the interactive mode, where you can choose which crawler to create (PHP/JavaScript/Typescript)

Create crawler interactive mode

Running crawler.

gcli crawler test -s amazon_com

Running crawler with a parameter.

gcli crawler test -s amazon_com -p '{"urls":["https://amazon.com/VVUH4HJ","https://amazon.com/FV4434"]}'

if JSON is complex, use a file instead

# contents of /tmp/amazon_params.json
{"urls": ["https://amazon.com/VV%20UH4HJ"], "strip": ["'", "\"", "\\"]}

gcli crawler test -s amazon_com --params-file '/tmp/amazon_params.json'

Hacks Used.

If the json parameter has a space, it might break parameter parsing. If the json parameter has a dash - and any character after it has a space, it will break parameter parsing. Cause: no double quoting around $@ in run_service.php:5:49 here This is fixed hackily by replacing string with its unicode \u0020 sequence. This works beacause $@ does not split on \u0020.

Installing NodeJS package in a crawler

gcli crawler package-install -p @vortex-ts-sdk/http-crawler -t node -s grepsr_api_oxylab_com_report
gcli crawler package-install -p typescript -t node -s grepsr_api_oxylab_com_report

Running Browser Automation

To run browser automation locally using gcli:

  1. Pull the browser image from AWS ECR:

    gcli browser pull <browser_name>
    
  2. Run the browser container:

    gcli browser run <browser_name>
    
  3. View the browser VNC stream using noVNC: To view the browser's graphical output, start the noVNC container mapping to the host's VNC port (5900):

    docker run \
      --add-host=host.docker.internal:host-gateway \
      -p 127.0.0.1:6080:6080 \
      gotget/novnc \
      --vnc host.docker.internal:5900
    

    Once running, open http://127.0.0.1:6080/vnc.html in your web browser to interact with the containerized browser.

inject custom command.

Say, for example, you wanted to inject a PHP function so that it could be called from inside your service code when testing locally. Note: All these files should only be created inside ~/.grepsr/tmp. Creating it outside will not work.

  1. Create a file called inject.php inside ~/.grepsr/tmp/
  2. Implement your function inside ~/.grepsr/tmp/inject.php
function addRowLocal($arr) {
    ...
    ...
}
  1. Create a file called inject.sh inside ~/.grepsr/tmp/
  2. inside inject.sh add:
alias php='php -d auto_prepend_file=/tmp/inject.php'

Note: the file location is /tmp/inject.php instead of ~/.grepsr/tmp/inject.php. This is because, the local path ~/.grepsr/tmp gets mapped to /tmp in the docker container. And inject.sh runs inside docker, instead of the local filesystem. 5. Add an entry in ~/.grepst/config.yml like so:

    php:
        ...
        sdk_image: ...
        pre_entry_run_file: inject.sh      # relative and limited to the tmp/ dir
  1. Now you can use addRowLocal() in your any of your files.
public function main($params) {
    ...
    $arr = $this->dataSet->getEmptyRow();
    addRowLocal($arr); // won't throw error
    ...
}

Would you like to contribute to grepsr-cli?

Be sure to uninstall gcli first, with pip uninstall grepsr-cli make changes, test and push.

git clone git@bitbucket.org:zznixt07/gcli.git grepsrcli
cd grepsrcli
pip install -e .

Features Added

  • drop stash after pushed successully. Before this, all stashes were always kept.
  • run a custom shell file before running your crawler. This allows possiblity like always injecting a php function in all your crawlers.
  • auto add Dependencies: ... that your crawler class extends (dependecies that are not extended by crawler classes but used elsewhere is upcoming)

TODO:

  • Experiment with git rebase on deploy fail. git rebase origin/master --autostash && git push
  • Handle Prioritization of same plugin name across multiple repo more deterministically. (maybe prioritize cwd path?)
  • node only run crawler if npm install is successfull. (add && between npm install and npm start)
  • run tsc before deploying vortex-ts-registry packages
  • add option to force update dependecies to latest version for all/specific vortex-ts-registry dependencies
  • handle ctrl+c during node package install on docker. (currently it continues running in BG)
  • add new baseclass typescript package and do not include SOP, (do not normalize - to _) change npm start to tsc and test runs. Also generate .d.ts file in tsconfig.json

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grepsr_cli-0.10.3.tar.gz (46.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grepsr_cli-0.10.3-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file grepsr_cli-0.10.3.tar.gz.

File metadata

  • Download URL: grepsr_cli-0.10.3.tar.gz
  • Upload date:
  • Size: 46.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.4

File hashes

Hashes for grepsr_cli-0.10.3.tar.gz
Algorithm Hash digest
SHA256 235c25ad5cc1003836ee5e1d2d9b845303fee4f4a5bf9b9f6dd0fd3bba4b1ddd
MD5 d073b1f8fcdbc28d9cba9b188cdf2c86
BLAKE2b-256 96cd30f08e1c75bff32b9278ee03e47f1880c5f602e808247f5155e787915218

See more details on using hashes here.

File details

Details for the file grepsr_cli-0.10.3-py3-none-any.whl.

File metadata

  • Download URL: grepsr_cli-0.10.3-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.4

File hashes

Hashes for grepsr_cli-0.10.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ecb110b2c01d3ccf13d2a9ae3dd25d537401dfa54ab68623bdd2c2a2de2e1f66
MD5 961b47fc9b145a062ae64018694fb296
BLAKE2b-256 17142cf82a33af4dba369b809bbae6288e0cc8ae49be0beaf8e6e29d559d8fd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page