Rotate TOR IPs with Scrapy (forked from scrapy-tor-proxy-rotation 0.0.1)
Project description
Scrapy Tor Proxy Rotation
This module allows Scrapy to rotate Tor IPs.
Install
Simple install, via pip:
pip install scrapy-tor-proxy-rotation
Config Tor
To configure Tor. First, install :
sudo apt-get install tor
Stop its execution to make configurations:
sudo service tor stop
Open your configuration file as root, available in /etc/tor/torrc, for example, using nano:
sudo nano /etc/tor/torrc
Place the lines below and save:
ControlPort 9051
CookieAuthentication 0
Restart Tor:
sudo service tor start
It is possible to verify the IP of your machine and compare it as Tor in the following way:
- To see your IP:
curl http://icanhazip.com/ - To see the ip of Tor:
torify curl http://icanhazip.com/
For Scrapy it is necessary to use an intermediary, in this case or Privoxy.
Tor Default Proxy Server: 127.0.0.1:9050
Install and Config Privoxy:
- Install:
sudo apt install privoxy
- Stop the service:
sudo service privoxy stop
- Open the config file:
sudo nano /etc/privoxy/config
- Add the following lines:
forward-socks5t / 127.0.0.1:9050 .
- Start the service:
service privoxy start
Test:
torify curl http://icanhazip.com/
curl -x 127.0.0.1:8118 http://icanhazip.com/
Use
After performing these configurations, it is possible to integrate Tor with Scrapy.
-
Configure the middleware in your settings file (settings.py):
DOWNLOADER_MIDDLEWARES = { ..., 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, 'tor_ip_rotator.middlewares.TorProxyMiddleware': 100 }
-
Add those in your custom_settings in your spider or in (settings.py) if you want to use them on all spiders from the project:
TOR_IPROTATOR_ENABLED = True TOR_IPROTATOR_CHANGE_AFTER = #número de requisições feitas em um mesmo endereço IP
By default, an IP can be reused after 10 other uses. This value can be altered by the variable TOR_IPROTATOR_ALLOW_REUSE_IP_AFTER, as below:
TOR_IPROTATOR_ALLOW_REUSE_IP_AFTER = #
A large number can also make it slower to retrieve a new IP to use or find. If the value is 0, there will be no record of used IPs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy-tor-proxy-rotator-0.0.2.tar.gz.
File metadata
- Download URL: scrapy-tor-proxy-rotator-0.0.2.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
572ecb1df7ea05685ae012c82325a7e237274cd17b244219652b4f39d3de49c7
|
|
| MD5 |
5cbacccc8412dab7f4ebb9f97e07e217
|
|
| BLAKE2b-256 |
6e6afa50fcab6bf4dbebf905cfc6b07b872acdd7ff938882e14ca3a94e539227
|