Scrapy utils
Project description
scrapyu
UserAgentMiddleware
# settings.py
USERAGENT_TYPE = 'firefox'
DOWNLOADER_MIDDLEWARES = {
'scrapyu.UserAgentMiddleware': 543,
}
MarkdownPipeline
# settings.py
MARKDOWNS_STORE = 'news'
ITEM_PIPELINES = {
'scrapyu.MarkdownPipeline': 300,
}
# items.py
import scrapy
class MarkdownItem(scrapy.Item):
html = scrapy.Field()
filename = scrapy.Field()
FirefoxCookiesMiddleware
# settings.py
GECKODRIVER_PATH = 'geckodriver'
DOWNLOADER_MIDDLEWARES = {
'scrapyu.FirefoxCookiesMiddleware': 543,
}
MongoDBPipeline
# settings.py
MONGODB_URI = 'mongodb://localhost:27017'
# or
# MONGODB_HOST = 'localhost'
# MONGODB_PORT = 27017
MONGODB_DATABASE = 'scrapyu'
MONGODB_COLLECTION = 'items'
MONGODB_BUFFER_LENGTH = 100
MONGODB_UNIQUE_KEY = 'title name' # use only if no buffer
# or
# MONGODB_UNIQUE_KEY = ['title', 'name']
# MONGODB_UNIQUE_KEY = ('title', 'name')
ITEM_PIPELINES = {
'scrapyu.MongoDBPipeline': 300,
}
RedisDupeFilter
# settings.py
DUPEFILTER_CLASS = 'scrapyu.RedisDupeFilter'
REDIS_DUPE_HOST = 'localhost'
REDIS_DUPE_PORT = 6379
REDIS_DUPE_DATABASE = 0
REDIS_DUPE_PASSWORD = 'password'
REDIS_DUPE_KEY = 'requests'
REDIS_DUPE_IGNORE_URL = r'http://scrapytest.org/\d+'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapyu-0.1.11.tar.gz
(7.7 kB
view details)
Built Distribution
File details
Details for the file scrapyu-0.1.11.tar.gz
.
File metadata
- Download URL: scrapyu-0.1.11.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.3 Windows/7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 714167dabdd9309f5a7df10e85504de0d25c76f8405f56eff335bcae6c6944d9 |
|
MD5 | 7a783564f1c11ec226c642d074fd2df6 |
|
BLAKE2b-256 | 1e8b1b1af16ac3f6bbabd1fcb995ec76788a78204ba60e9daaa9d1d98c9737e5 |
Provenance
File details
Details for the file scrapyu-0.1.11-py3-none-any.whl
.
File metadata
- Download URL: scrapyu-0.1.11-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.3 Windows/7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12d177a543d261d3d895b6ac6c3837feea0a5640943c47c663927ff55fa72ec2 |
|
MD5 | 5fd89bd8f78f79410a4e1e03d6c06277 |
|
BLAKE2b-256 | 64397f4e56b41cf0a40173445da3ea50a6f19bd94ad420ad2d11980b5b22e360 |