Skip to main content

Crawl the Google PlayStore

Project description

Repo on GitLab Repo on GitHub license commit-activity Mastodon Follow

gplaycrawler

Discover apps by different mehtods. Mass download app packages and metadata.

Setup

Install protobuf: Using apt:

$ apt install -y protobuf-compiler

Using pacman:

$ pacman -S protobuf

Check version:

protoc --version  # Ensure compiler version is 3+

Install gplaycrawler using pip:

$ pip install gplaycrawler

Usage

set env vars (optional):

export PLAYSTORE_TOKEN='ya29.fooooo'
export PLAYSTORE_GSFID='1234567891234567890'
export HTTP_PROXY='http://localhost:8080'
export HTTPS_PROXY='http://localhost:8080'
export CURL_CA_BUNDLE='/usr/local/myproxy_info/cacert.pem'
usage: gplaycrawler [-h] [-v {warning,info,debug}]
                    {help,usage,charts,search,related,metadata,packages} ...

Crawl the Google PlayStore

positional arguments:
  {help,usage,charts,search,related,metadata,packages}
                        Desired action to perform
    help                Print this help message
    usage               Print full usage
    charts              parallel downloading of all cross category app charts
    search              parallel searching of apps via search terms
    related             parallel searching of apps via related apps
    metadata            parallel scraping of app metadata
    packages            parallel downloading app packages

optional arguments:
  -h, --help            show this help message and exit
  -v {warning,info,debug}, --verbosity {warning,info,debug}
                        Set verbosity level (default: info)


All commands in detail:


Common optional arguments for related, search, metadata, packages:
  --locale LOCALE      (default: en_US)
  --timezone TIMEZONE  (default: UTC)
  --device DEVICE      (default: px_3a)
  --delay DELAY        Delay between every request in seconds (default: 0.51)
  --threads THREADS    Number of parallel workers (default: 2)


related:
usage: gplaycrawler related [-h] [--locale LOCALE] [--timezone TIMEZONE]
                            [--device DEVICE] [--delay DELAY]
                            [--threads THREADS] [--output OUTPUT]
                            [--level LEVEL]
                            input

parallel searching of apps via related apps

positional arguments:
  input                name of the input file (default: charts.json)

optional arguments:
  --output OUTPUT      base name of the output files (default: ids_related)
  --level LEVEL        How deep to crawl (default: 3)


search:
usage: gplaycrawler search [-h] [--locale LOCALE] [--timezone TIMEZONE]
                           [--device DEVICE] [--delay DELAY]
                           [--threads THREADS] [--output OUTPUT]
                           [--length LENGTH]

parallel searching of apps via search terms

optional arguments:
  --output OUTPUT      name of the output file (default: ids_search.json)
  --length LENGTH      length of strings to search (default: 2)


metadata:
usage: gplaycrawler metadata [-h] [--locale LOCALE] [--timezone TIMEZONE]
                             [--device DEVICE] [--delay DELAY]
                             [--threads THREADS] [--output OUTPUT]
                             input

parallel scraping of app metadata

positional arguments:
  input                name of the input file (json)

optional arguments:
  --output OUTPUT      directory name of the output files (default:
                       out_metadata)


packages:
usage: gplaycrawler packages [-h] [--locale LOCALE] [--timezone TIMEZONE]
                             [--device DEVICE] [--delay DELAY]
                             [--threads THREADS] [--output OUTPUT]
                             [--expansions] [--splits]
                             input

parallel downloading app packages

positional arguments:
  input                name of the input file (json)

optional arguments:
  --output OUTPUT      directory name of the output files (default:
                       out_packages)
  --expansions         also download expansion files (default: False)
  --splits             also download split files (default: False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gplaycrawler-0.2.1.tar.gz (13.5 kB view hashes)

Uploaded source

Built Distribution

gplaycrawler-0.2.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page