Skip to main content

Crawling all GEO metadata.

Project description

geo-spider

crawl all GEO metadata, features:

  1. crawl platforms
  2. crawl samples
  3. crawl series
  4. incremental crawling
  5. missed crawling

Table of Contents

  1. installation
  2. output file format
  3. logs
  4. platforms
  5. samples
  6. series

installation

pip install geo-spider

output file format

geo-spider saves files in jsonlines form, Refer to this site for details.

logs

geo-spider default generate logs to geo-spider.log(current directory) in WARNING level, you can customize by -d and -l options.

  1. -d to enable debug mode
  2. -l specify customized log file
geo-spider -d -l new-geo-spider.log <sub-command>

platforms

platforms denovo crawling

geo-spider platforms -o platforms.jl

platforms incremental crawling

If you have a crawled platforms jsonlines file:

geo-spider platforms -cf platforms.jl -o new-platforms.jl

If you have multiple platforms jsonlines files:

geo-spider platforms -cd platforms -o new-platforms.jl

platforms missed crawling

Specify -cf or -cd like incremental crawling, add a -m option.

geo-spider platforms -cf platforms.jl -m missed -o new-platforms.jl

samples

samples denovo crawling

geo-spider samples -o samples.jl

samples incremental crawling

geo-spider samples -pcf platforms.jl -cf samples.jl -o new-samples.jl

samples missed crawling

geo-spider samples -pcf platforms.jl -cf samples.jl -m missed -o new-samples.jl

series

series denovo crawling

geo-spider series -o series.jl

series incremental crawling

geo-spider series -pcf platforms.jl -scf samples.jl -cf series.jl -o new-series.jl

series missed crawling

geo-spider series -pcf platforms.jl -scf samples.jl -cf series.jl -m missed -o new-series.jl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geo-spider-0.0.5.tar.gz (5.3 kB view details)

Uploaded Source

Built Distributions

geo_spider-0.0.5-py3.7.egg (9.2 kB view details)

Uploaded Source

geo_spider-0.0.5-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file geo-spider-0.0.5.tar.gz.

File metadata

  • Download URL: geo-spider-0.0.5.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.7

File hashes

Hashes for geo-spider-0.0.5.tar.gz
Algorithm Hash digest
SHA256 d7f72efeb3eb0f0c47e2769b687532a3bc05224fe71736b517a99c60fa4f8205
MD5 0fc61d74ec7cbeecbe6f99929bb03f86
BLAKE2b-256 e60411427d6fb07b376428e28e259e75dcacf406df624d803bc3ecec0084d28a

See more details on using hashes here.

File details

Details for the file geo_spider-0.0.5-py3.7.egg.

File metadata

  • Download URL: geo_spider-0.0.5-py3.7.egg
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.7

File hashes

Hashes for geo_spider-0.0.5-py3.7.egg
Algorithm Hash digest
SHA256 94fc97203fe489df737e85bf2d6339f56450fcbc217f2c1c231c221e2e6b87e5
MD5 acf6c24ed5c16243027414f1a08c490a
BLAKE2b-256 d9a0dbfe5612194751ce973c5688855c4b8c520e5a46709bc91ae28871c28cfa

See more details on using hashes here.

File details

Details for the file geo_spider-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: geo_spider-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.7

File hashes

Hashes for geo_spider-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9dabb06e1a2ca91f4e6c052554070da5746551cf414d7c9af5d9120ee60e2fb6
MD5 07448375ae2e062104ea75626262720f
BLAKE2b-256 fd273ea9ae0aba2fae8212c6d0756953cdeb8b4f93c399bbded6a7733cce9d1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page