异步高并发dblp爬虫,慎用
Project description
dblp-crawler
Asynchronous high-concurrency dblp crawler, use with caution!
异步高并发dblp爬虫,慎用!
Install
pip install dblp-crawler
Usage
Help
python -m dblp_crawler -h
usage: __main__.py [-h] [-y YEAR] -k KEYWORD [-p PID] [-j JOURNAL] {networkx,neo4j} ...
positional arguments:
{networkx,neo4j} sub-command help
networkx networkx help
neo4j neo4j help
optional arguments:
-h, --help show this help message and exit
-y YEAR, --year YEAR Only crawl the paper after the specified year.
-k KEYWORD, --keyword KEYWORD
Specify keyword rules.
-p PID, --pid PID Specified author pids to start crawling.
-j JOURNAL, --journal JOURNAL
Specify author journal keys to start crawling.
python -m dblp_crawler networkx -h
usage: __main__.py networkx [-h] --dest DEST
optional arguments:
-h, --help show this help message and exit
--dest DEST Path to write results.
python -m dblp_crawler neo4j -h
usage: __main__.py neo4j [-h] [--auth AUTH] --uri URI
optional arguments:
-h, --help show this help message and exit
--auth AUTH Auth to neo4j database.
--uri URI URI to neo4j database.
Write to a JSON file
e.g. write to summary.json
:
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu networkx --dest summary.json
Write to a Neo4J database
e.g. write to neo4j://10.128.202.18:7687
:
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu neo4j --uri neo4j://10.128.202.18:7687
Only crawl the paper after specified year
e.g. crawl the paper after 2016 (include 2016)
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu -y 2016 networkx --dest summary.json
Keywords with two or more words
e.g. super resolution (publications with title contains both "super" and "resolution" will be selected)
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu -k "'super','resolution'" networkx --dest summary.json
Init authors from journal
e.g. init authors from ACM MM (db/conf/mm
is the key for ACM MM in dblp: "https://dblp.org/db/conf/mm/index.xml")
python -m dblp_crawler -k video -k edge -j db/conf/mm networkx --dest summary.json
Init authors from journal in some variables
e.g. there is a CCF_A
in dblp_crawler.data
contains keys of CCF A conferences
python -m dblp_crawler -k video -k edge -j "importlib.import_module('dblp_crawler.data').CCF_A" networkx --dest summary.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dblp_crawler-1.3.0.tar.gz
(28.2 kB
view details)
File details
Details for the file dblp_crawler-1.3.0.tar.gz
.
File metadata
- Download URL: dblp_crawler-1.3.0.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08d4efb876f714953fd66f0a3056f9149858d4ba6984ba752c63a72ca442b296 |
|
MD5 | f80c0142cb41e73c148cc81c0d276403 |
|
BLAKE2b-256 | 5581d6c0a3c929f3667c24aff9d8b29ae05632861657662bf7ae688bc22150d9 |