Skip to main content

Crawl the Google PlayStore

Project description

Repo on GitLab Repo on GitHub license commit-activity Mastodon Follow

gplaycrawler

Discover apps by different mehtods. Mass download app packages and metadata.

Setup

Install protobuf: Using apt:

$ apt install -y protobuf-compiler

Using pacman:

$ pacman -S protobuf

Check version:

protoc --version  # Ensure compiler version is 3+

Install gplaycrawler using pip:

$ pip install gplaycrawler

Usage

set env vars (optional):

export PLAYSTORE_TOKEN='ya29.fooooo'
export PLAYSTORE_GSFID='1234567891234567890'
export HTTP_PROXY='http://localhost:8080'
export HTTPS_PROXY='http://localhost:8080'
export CURL_CA_BUNDLE='/usr/local/myproxy_info/cacert.pem'
usage: gplaycrawler [-h] [-v {warning,info,debug}]
                    {help,usage,charts,search,related,metadata,packages} ...

Crawl the Google PlayStore

positional arguments:
  {help,usage,charts,search,related,metadata,packages}
                        Desired action to perform
    help                Print this help message
    usage               Print full usage
    charts              parallel downloading of all cross category app charts
    search              parallel searching of apps via search terms
    related             parallel searching of apps via related apps
    metadata            parallel scraping of app metadata
    packages            parallel downloading app packages

optional arguments:
  -h, --help            show this help message and exit
  -v {warning,info,debug}, --verbosity {warning,info,debug}
                        Set verbosity level (default: info)


All commands in detail:


Common optional arguments for related, search, metadata, packages:
  --locale LOCALE      (default: en_US)
  --timezone TIMEZONE  (default: UTC)
  --device DEVICE      (default: px_3a)
  --delay DELAY        Delay between every request in seconds (default: 0.51)
  --threads THREADS    Number of parallel workers (default: 2)


related:
usage: gplaycrawler related [-h] [--locale LOCALE] [--timezone TIMEZONE]
                            [--device DEVICE] [--delay DELAY]
                            [--threads THREADS] [--output OUTPUT]
                            [--level LEVEL]
                            input

parallel searching of apps via related apps

positional arguments:
  input                name of the input file (default: charts.json)

optional arguments:
  --output OUTPUT      base name of the output files (default: ids_related)
  --level LEVEL        How deep to crawl (default: 3)


search:
usage: gplaycrawler search [-h] [--locale LOCALE] [--timezone TIMEZONE]
                           [--device DEVICE] [--delay DELAY]
                           [--threads THREADS] [--output OUTPUT]
                           [--length LENGTH]

parallel searching of apps via search terms

optional arguments:
  --output OUTPUT      name of the output file (default: ids_search.json)
  --length LENGTH      length of strings to search (default: 2)


metadata:
usage: gplaycrawler metadata [-h] [--locale LOCALE] [--timezone TIMEZONE]
                             [--device DEVICE] [--delay DELAY]
                             [--threads THREADS] [--output OUTPUT]
                             input

parallel scraping of app metadata

positional arguments:
  input                name of the input file (json)

optional arguments:
  --output OUTPUT      directory name of the output files (default:
                       out_metadata)


packages:
usage: gplaycrawler packages [-h] [--locale LOCALE] [--timezone TIMEZONE]
                             [--device DEVICE] [--delay DELAY]
                             [--threads THREADS] [--output OUTPUT]
                             [--expansions] [--splits]
                             input

parallel downloading app packages

positional arguments:
  input                name of the input file (json)

optional arguments:
  --output OUTPUT      directory name of the output files (default:
                       out_packages)
  --expansions         also download expansion files (default: False)
  --splits             also download split files (default: False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gplaycrawler-0.2.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

gplaycrawler-0.2.1-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file gplaycrawler-0.2.1.tar.gz.

File metadata

  • Download URL: gplaycrawler-0.2.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for gplaycrawler-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0f7e8bbcd6551d265f3ac8415cc06235121a2fcfe2a702b8a36ff356d58f662f
MD5 fdf0cd7010b47fdef808fe2f68f7eacd
BLAKE2b-256 250bb66b4ebd5dc0ef8ca9da9a806d7c2f0e78e3ea1f29b934fd5fd22754c0ac

See more details on using hashes here.

File details

Details for the file gplaycrawler-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: gplaycrawler-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for gplaycrawler-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 17fce15d703294f46d99ad0d47f8944180f7d1cd6c16d0f8644625f51dc8d6e4
MD5 32913f880e8ddf033c4f13920d96dbff
BLAKE2b-256 b3c5b61b7c808bd97e5cb1590d887c26ffea4567650296b9fd918ea0f0a725c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page