Scrapy utils
Project description
scrapyu
UserAgentMiddleware
# settings.py
USERAGENT_TYPE = 'firefox'
DOWNLOADER_MIDDLEWARES = {
'scrapyu.UserAgentMiddleware': 543,
}
MarkdownPipeline
# settings.py
MARKDOWNS_STORE = 'news'
ITEM_PIPELINES = {
'scrapyu.MarkdownPipeline': 300,
}
# items.py
import scrapy
class MarkdownItem(scrapy.Item):
html = scrapy.Field()
filename = scrapy.Field()
FirefoxCookiesMiddleware
# settings.py
GECKODRIVER_PATH = 'geckodriver'
DOWNLOADER_MIDDLEWARES = {
'scrapyu.FirefoxCookiesMiddleware': 543,
}
MongoDBPipeline
# settings.py
MONGODB_URI = 'mongodb://localhost:27017'
# or
# MONGODB_HOST = 'localhost'
# MONGODB_PORT = 27017
MONGODB_DATABASE = 'scrapyu'
MONGODB_COLLECTION = 'items'
MONGODB_BUFFER_LENGTH = 100
MONGODB_UNIQUE_KEY = 'title name' # use only if no buffer
# or
# MONGODB_UNIQUE_KEY = ['title', 'name']
# MONGODB_UNIQUE_KEY = ('title', 'name')
ITEM_PIPELINES = {
'scrapyu.MongoDBPipeline': 300,
}
RedisDupeFilter
# settings.py
DUPEFILTER_CLASS = 'scrapyu.RedisDupeFilter'
REDIS_DUPE_HOST = 'localhost'
REDIS_DUPE_PORT = 6379
REDIS_DUPE_DATABASE = 0
REDIS_DUPE_PASSWORD = 'password'
REDIS_DUPE_KEY = 'requests'
REDIS_DUPE_IGNORE_URL = r'http://scrapytest.org/\d+'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapyu-0.1.10.tar.gz
(6.4 kB
view details)
Built Distribution
File details
Details for the file scrapyu-0.1.10.tar.gz
.
File metadata
- Download URL: scrapyu-0.1.10.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.0 CPython/3.7.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b1b162ae758af621ea0add9940fefa580ce797376c3c61fe44bc98264b4577e |
|
MD5 | c0dc715e453ae2053ece435f4f1c76f7 |
|
BLAKE2b-256 | 6b85021634f13b5faf66169f519f97ffee400997efee0b372473f9832c73ff6b |
File details
Details for the file scrapyu-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: scrapyu-0.1.10-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.0 CPython/3.7.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d93ec5d149e242a8277b645ac6544d726adf13b677f538ea43615922986aaf1 |
|
MD5 | 6937a7fb1a7dbe1e3c75161f01b25bfe |
|
BLAKE2b-256 | 14ecb14039c1264b03468171ddec3d987a0386d8f5ae34644b96c266f35193e0 |