Scrapy Save Statistics: Save statistics extension for Scrapy
Project description
Save statistics to mongo for analytics.
Install
The quick way:
pip install scrapy-save-statistics
Or install from GitHub:
pip install git+git://github.com/light4/scrapy-save-statistics.git@master
Or checkout the source and run:
python setup.py install
settings.py
Mongodb settings for save statistics, need a statistics database.
MONGO_HOST = "127.0.0.1" MONGO_PORT = 27017 MONGO_DB = "myspider" MONGO_STATISTICS = "statistics" EXTENSIONS = { 'scrapy_save_statistics.SaveStatistics': 100, }
Spider
Spider must have statistics attributes and contains spider_url. We’ll save that info to mongodb.
class TestSpider(scrapy.Spider): name = "test" def __init__(self, name=None, **kwargs): super(TestSpider, self).__init__(name=name, **kwargs) self.statistics = [] def parse(self, response): crawl_info = {'spider_url': spider_url, 'expected_crawl_num': expected_crawl_num, 'pages': total_page} self.statistics.append(crawl_info)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for scrapy_save_statistics-0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeb9743d06da179c480c5747cd086b1c19c5d0757295b72e950bc43c223f7e53 |
|
MD5 | 571d7b75a8fa663a540ed5b7c30b0dc7 |
|
BLAKE2b-256 | c0ddd05e4c25af97e1bd384e1666b12d59e9a452f032b975e182a1dea6b67be2 |