Scrapy utils
Project description
scrapyu
UserAgentMiddleware
# settings.py
USERAGENT_TYPE = 'firefox'
DOWNLOADER_MIDDLEWARES = {
'scrapyu.UserAgentMiddleware': 543,
}
MarkdownPipeline
# settings.py
MARKDOWNS_STORE = 'news'
ITEM_PIPELINES = {
'scrapyu.MarkdownPipeline': 300,
}
# items.py
import scrapy
class MarkdownItem(scrapy.Item):
html = scrapy.Field()
filename = scrapy.Field()
FirefoxCookiesMiddleware
# settings.py
GECKODRIVER_PATH = 'geckodriver'
DOWNLOADER_MIDDLEWARES = {
'scrapyu.FirefoxCookiesMiddleware': 543,
}
MongoDBPipeline
# settings.py
MONGODB_URI = 'mongodb://localhost:27017'
# or
# MONGODB_HOST = 'localhost'
# MONGODB_PORT = 27017
MONGODB_DATABASE = 'scrapyu'
MONGODB_COLLECTION = 'items'
MONGODB_BUFFER_LENGTH = 100
MONGODB_UNIQUE_KEY = 'title name' # use only if no buffer
# or
# MONGODB_UNIQUE_KEY = ['title', 'name']
# MONGODB_UNIQUE_KEY = ('title', 'name')
ITEM_PIPELINES = {
'scrapyu.MongoDBPipeline': 300,
}
RedisDupeFilter
# settings.py
DUPEFILTER_CLASS = 'scrapyu.RedisDupeFilter'
REDIS_DUPE_HOST = 'localhost'
REDIS_DUPE_PORT = 6379
REDIS_DUPE_DATABASE = 0
REDIS_DUPE_PASSWORD = 'password'
REDIS_DUPE_KEY = 'requests'
REDIS_DUPE_IGNORE_URL = r'http://scrapytest.org/\d+'
genspider
scrapyu genspider -l
results in :
Available templates:
single
single_splash
generate a single file spider
scrapyu genspider python www.python.org -t single
generate a single file spider, integration splash
scrapyu genspider python www.python.org -t single_splash
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapyu-0.1.12.tar.gz
(8.0 kB
view details)
Built Distribution
File details
Details for the file scrapyu-0.1.12.tar.gz
.
File metadata
- Download URL: scrapyu-0.1.12.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.3 Windows/7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25d53001063d9da178ab2aa42afe6ac2852c9b4ab8310339a75fc0895c291213 |
|
MD5 | 447532f6140bb24eee654d7c9161c070 |
|
BLAKE2b-256 | f0a2ccf5ba2e321a4c8c7bc998960183d021da2f1c84bbc0b8ae9858ae7c11ea |
File details
Details for the file scrapyu-0.1.12-py3-none-any.whl
.
File metadata
- Download URL: scrapyu-0.1.12-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.3 Windows/7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 569813efe543ce972a5ad6e2c35604a4db45aad44aae61494e5959d11d6a5242 |
|
MD5 | e0092ff801fd8669bbc6744814c81124 |
|
BLAKE2b-256 | c8f0706bd41a0f9be2d7533c675d6285611ed1fd3bc60eee9d7c52ce9e4f8382 |