异步高并发citation爬虫,慎用
Project description
citation-crawler
Asynchronous high-concurrency dblp crawler, use with caution!
异步高并发引文数据爬虫,慎用
Only support Semantic Scholar currently.
目前支持从Semantic Scholar上爬references和citations
Crawl papers from dblp and connect them into an undirected graph. Each edge is a paper, each node is an author.
爬引文数据并将其组织为无向图。图的节点是文章,边是引用关系
Install
pip install citation-crawler
Usage
Config environment variables
CITATION_CRAWLER_MAX_CACHE_DAYS_AUTHORS
:- save cache for a paper authors page (to get authors of a published paper) for how many days
- default:
-1
(cache forever, since authors of a paper are not likely to change)
CITATION_CRAWLER_MAX_CACHE_DAYS_REFERENCES
:- save cache for a reference page (to get references of a published paper) for how many days
- default:
-1
(cache forever, since references of a paper are not likely to change)
CITATION_CRAWLER_MAX_CACHE_DAYS_CITATIONS
- save cache for a citation page (to get citations of a published paper) for how many days
- default:
7
(citations of a paper may change frequently)
CITATION_CRAWLER_MAX_CACHE_DAYS_PAPER
- save cache for a paper detail page (to get details of a paper) for how many days
- default:
-1
(cache forever, since detailed information of a published paper are not likely to change)
HTTP_PROXY
- Set it
http://your_user:your_password@your_proxy_url:your_proxy_port
if you want to use proxy
- Set it
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file citation_crawler-1.3.0.tar.gz
.
File metadata
- Download URL: citation_crawler-1.3.0.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | adb3adf92a668688ad0773538924b75721b0e94d3377bf05e42acb6a6276932b |
|
MD5 | 3eace8d1b2b361c55b5477d16a350f77 |
|
BLAKE2b-256 | 54dcd3cf01c7063f64987224b82d18328abe483b20828da883ba8082a8d7fe9a |