Skip to main content

Crawling all GEO metadata.

Project description

geo-spider

crawl all GEO metadata, features:

  1. crawl platforms
  2. crawl samples
  3. crawl series
  4. incremental crawling
  5. missed crawling

Table of Contents

  1. installation
  2. output file format
  3. logs
  4. platforms
  5. samples
  6. series

installation

pip install geo-spider

output file format

geo-spider saves files in jsonlines form, Refer to this site for details.

logs

geo-spider default generate logs to geo-spider.log(current directory) in WARNING level, you can customize by -d and -l options.

  1. -d to enable debug mode
  2. -l specify customized log file
geo-spider -d -l new-geo-spider.log <sub-command>

platforms

platforms denovo crawling

geo-spider platforms -o platforms.jl

platforms incremental crawling

If you have a crawled platforms jsonlines file:

geo-spider platforms -cf platforms.jl -o new-platforms.jl

If you have multiple platforms jsonlines files:

geo-spider platforms -cd platforms -o new-platforms.jl

platforms missed crawling

Specify -cf or -cd like incremental crawling, add a -m option.

geo-spider platforms -cf platforms.jl -m missed -o new-platforms.jl

samples

samples denovo crawling

geo-spider samples -o samples.jl

samples incremental crawling

geo-spider samples -pcf platforms.jl -cf samples.jl -o new-samples.jl

samples missed crawling

geo-spider samples -pcf platforms.jl -cf samples.jl -m missed -o new-samples.jl

series

series denovo crawling

geo-spider series -o series.jl

series incremental crawling

geo-spider series -pcf platforms.jl -scf samples.jl -cf series.jl -o new-series.jl

series missed crawling

geo-spider series -pcf platforms.jl -scf samples.jl -cf series.jl -m missed -o new-series.jl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geo-spider-0.0.5.tar.gz (5.3 kB view hashes)

Uploaded Source

Built Distributions

geo_spider-0.0.5-py3.7.egg (9.2 kB view hashes)

Uploaded Source

geo_spider-0.0.5-py3-none-any.whl (4.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page