Rotate TOR IPs with Scrapy (forked from scrapy-tor-proxy-rotation 0.0.1)
Project description
Scrapy Tor Proxy Rotation
This module allows Scrapy to rotate Tor IPs.
Install
Simple install, via pip:
pip install scrapy-tor-proxy-rotation
Config Tor
To configure Tor. First, install :
sudo apt-get install tor
Stop its execution to make configurations:
sudo service tor stop
Open your configuration file as root, available in /etc/tor/torrc, for example, using nano:
sudo nano /etc/tor/torrc
Place the lines below and save:
ControlPort 9051
CookieAuthentication 0
Restart Tor:
sudo service tor start
It is possible to verify the IP of your machine and compare it as Tor in the following way:
- To see your IP:
curl http://icanhazip.com/
- To see the ip of Tor:
torify curl http://icanhazip.com/
For Scrapy it is necessary to use an intermediary, in this case or Privoxy.
Tor Default Proxy Server: 127.0.0.1:9050
Install and Config Privoxy:
- Install:
sudo apt install privoxy
- Stop the service:
sudo service privoxy stop
- Open the config file:
sudo nano /etc/privoxy/config
- Add the following lines:
forward-socks5t / 127.0.0.1:9050 .
- Start the service:
service privoxy start
Test:
torify curl http://icanhazip.com/
curl -x 127.0.0.1:8118 http://icanhazip.com/
Use
After performing these configurations, it is possible to integrate Tor with Scrapy.
-
Configure the middleware in your settings file (settings.py):
DOWNLOADER_MIDDLEWARES = { ..., 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, 'tor_ip_rotator.middlewares.TorProxyMiddleware': 100 }
-
Add those in your custom_settings in your spider or in (settings.py) if you want to use them on all spiders from the project:
TOR_IPROTATOR_ENABLED = True TOR_IPROTATOR_CHANGE_AFTER = #número de requisições feitas em um mesmo endereço IP
By default, an IP can be reused after 10 other uses. This value can be altered by the variable TOR_IPROTATOR_ALLOW_REUSE_IP_AFTER, as below:
TOR_IPROTATOR_ALLOW_REUSE_IP_AFTER = #
A large number can also make it slower to retrieve a new IP to use or find. If the value is 0, there will be no record of used IPs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for scrapy-tor-proxy-rotator-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 572ecb1df7ea05685ae012c82325a7e237274cd17b244219652b4f39d3de49c7 |
|
MD5 | 5cbacccc8412dab7f4ebb9f97e07e217 |
|
BLAKE2b-256 | 6e6afa50fcab6bf4dbebf905cfc6b07b872acdd7ff938882e14ca3a94e539227 |