异步高并发dblp爬虫,慎用
Project description
dblp-crawler
Asynchronous high-concurrency dblp crawler, use with caution!
异步高并发dblp爬虫,慎用!
Install
pip install dblp-crawler
Usage
Help
python -m dblp_crawler -h
usage: __main__.py [-h] [-y YEAR] -k KEYWORD [-p PID] [-j JOURNAL] {networkx,neo4j} ...
positional arguments:
{networkx,neo4j} sub-command help
networkx networkx help
neo4j neo4j help
optional arguments:
-h, --help show this help message and exit
-y YEAR, --year YEAR Only crawl the paper after the specified year.
-k KEYWORD, --keyword KEYWORD
Specify keyword rules.
-p PID, --pid PID Specified author pids to start crawling.
-j JOURNAL, --journal JOURNAL
Specify author journal keys to start crawling.
python -m dblp_crawler networkx -h
usage: __main__.py networkx [-h] --dest DEST
optional arguments:
-h, --help show this help message and exit
--dest DEST Path to write results.
python -m dblp_crawler neo4j -h
usage: __main__.py neo4j [-h] [--auth AUTH] --uri URI
optional arguments:
-h, --help show this help message and exit
--auth AUTH Auth to neo4j database.
--uri URI URI to neo4j database.
Write to a JSON file
e.g. write to summary.json
:
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu networkx --dest summary.json
Write to a Neo4J database
e.g. write to neo4j://10.128.202.18:7687
:
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu neo4j --uri neo4j://10.128.202.18:7687
Only crawl the paper after specified year
e.g. crawl the paper after 2016 (include 2016)
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu -y 2016 networkx --dest summary.json
Keywords with two or more words
e.g. super resolution (publications with title contains both "super" and "resolution" will be selected)
python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu -k "'super','resolution'" networkx --dest summary.json
Init authors from journal
e.g. init authors from ACM MM (db/conf/mm
is the key for ACM MM in dblp: "https://dblp.org/db/conf/mm/index.xml")
python -m dblp_crawler -k video -k edge -j db/conf/mm networkx --dest summary.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dblp_crawler-1.1.0.tar.gz
(27.9 kB
view details)
File details
Details for the file dblp_crawler-1.1.0.tar.gz
.
File metadata
- Download URL: dblp_crawler-1.1.0.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07b2382aa5c7a0122ea3c9a8d72c865e51d734611ac5d684ea60d1194f6d0309 |
|
MD5 | 61fd4743313a08fa70c1823be4e35da5 |
|
BLAKE2b-256 | eb0474f4b760a39cfbda09a3d4c5c236816983ed71c22e7a7f78f927dba2b83c |