Crawling all GEO metadata.
Project description
geo-spider
crawl all GEO metadata, features:
- crawl platforms
- crawl samples
- crawl series
- incremental crawling
- missed crawling
Table of Contents
installation
pip install geo-spider
output file format
geo-spider saves files in jsonlines form, Refer to this site for details.
logs
geo-spider default generate logs to geo-spider.log(current directory)
in WARNING level, you can customize by -d
and -l
options.
-d
to enable debug mode-l
specify customized log file
geo-spider -d -l new-geo-spider.log <sub-command>
platforms
platforms denovo crawling
geo-spider platforms -o platforms.jl
platforms incremental crawling
If you have a crawled platforms jsonlines file:
geo-spider platforms -cf platforms.jl -o new-platforms.jl
If you have multiple platforms jsonlines files:
geo-spider platforms -cd platforms -o new-platforms.jl
platforms missed crawling
Specify -cf
or -cd
like incremental crawling, add a -m
option.
geo-spider platforms -cf platforms.jl -m missed -o new-platforms.jl
samples
samples denovo crawling
geo-spider samples -o samples.jl
samples incremental crawling
geo-spider samples -pcf platforms.jl -cf samples.jl -o new-samples.jl
samples missed crawling
geo-spider samples -pcf platforms.jl -cf samples.jl -m missed -o new-samples.jl
series
series denovo crawling
geo-spider series -o series.jl
series incremental crawling
geo-spider series -pcf platforms.jl -scf samples.jl -cf series.jl -o new-series.jl
series missed crawling
geo-spider series -pcf platforms.jl -scf samples.jl -cf series.jl -m missed -o new-series.jl
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
geo-spider-0.0.5.tar.gz
(5.3 kB
view hashes)
Built Distributions
geo_spider-0.0.5-py3.7.egg
(9.2 kB
view hashes)
Close
Hashes for geo_spider-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dabb06e1a2ca91f4e6c052554070da5746551cf414d7c9af5d9120ee60e2fb6 |
|
MD5 | 07448375ae2e062104ea75626262720f |
|
BLAKE2b-256 | fd273ea9ae0aba2fae8212c6d0756953cdeb8b4f93c399bbded6a7733cce9d1a |