Scrapy Middleware to set a random User-Agent for every Request.
Project description
Does your scrapy spider get identified and blocked by servers because you use the default user-agent or a generic one?
Use this random_useragent module and set a random user-agent for every request. You are limited only by the number of different user-agents you set in a text file.
Installing
Installing it is pretty simple.
pip install scrapy-random-useragent
Usage
In your settings.py file, update the DOWNLOADER_MIDDLEWARES variable like this.
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
'random_useragent.RandomUserAgentMiddleware': 400
}
This disables the default UserAgentMiddleware and enables the RandomUserAgentMiddleware.
Then, create a new variable USER_AGENT_LIST with the path to your text file which has the list of all user-agents (one user-agent per line).
USER_AGENT_LIST = "/path/to/useragents.txt"
Now all the requests from your crawler will have a random user-agent picked from the text file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for scrapy-random-useragent-0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b34520d4e960c377d1ba9e3a95cef6577803fb057161e59cb2a2ffe5d754d790 |
|
MD5 | ccb12a85c599fc1b18281ecc20fcacc2 |
|
BLAKE2b-256 | e305572ec810fbdca07bb16f2f5b23f0e36b46f2a2362a8e2377e8e27315e974 |