Skip to main content

Rotate TOR IPs with Scrapy (forked from scrapy-tor-proxy-rotation 0.0.1)

Project description

Scrapy Tor Proxy Rotation

This module allows Scrapy to rotate Tor IPs.

Install

Simple install, via pip:

pip install scrapy-tor-proxy-rotation

Config Tor

To configure Tor. First, install :

sudo apt-get install tor

Stop its execution to make configurations:

sudo service tor stop

Open your configuration file as root, available in /etc/tor/torrc, for example, using nano:

sudo nano /etc/tor/torrc

Place the lines below and save:

ControlPort 9051
CookieAuthentication 0

Restart Tor:

sudo service tor start

It is possible to verify the IP of your machine and compare it as Tor in the following way:

  • To see your IP:
    curl http://icanhazip.com/
    
  • To see the ip of Tor:
    torify curl http://icanhazip.com/   
    

For Scrapy it is necessary to use an intermediary, in this case or Privoxy.

Tor Default Proxy Server: 127.0.0.1:9050

Install and Config Privoxy:

  • Install:
    sudo apt install privoxy
    
  • Stop the service:
    sudo service privoxy stop
    
  • Open the config file:
    sudo nano /etc/privoxy/config
    
  • Add the following lines:
    forward-socks5t / 127.0.0.1:9050 .
    
  • Start the service:
    service privoxy start
    

Test:

torify curl http://icanhazip.com/
curl -x 127.0.0.1:8118 http://icanhazip.com/

Use

After performing these configurations, it is possible to integrate Tor with Scrapy.

  • Configure the middleware in your settings file (settings.py):

    DOWNLOADER_MIDDLEWARES = {
        ...,
        'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
        'tor_ip_rotator.middlewares.TorProxyMiddleware': 100
    }
    
  • Add those in your custom_settings in your spider or in (settings.py) if you want to use them on all spiders from the project:

    TOR_IPROTATOR_ENABLED = True
    TOR_IPROTATOR_CHANGE_AFTER = #número de requisições feitas em um mesmo endereço IP
    

By default, an IP can be reused after 10 other uses. This value can be altered by the variable TOR_IPROTATOR_ALLOW_REUSE_IP_AFTER, as below:

TOR_IPROTATOR_ALLOW_REUSE_IP_AFTER = #

A large number can also make it slower to retrieve a new IP to use or find. If the value is 0, there will be no record of used IPs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-tor-proxy-rotator-0.0.2.tar.gz (4.6 kB view details)

Uploaded Source

File details

Details for the file scrapy-tor-proxy-rotator-0.0.2.tar.gz.

File metadata

File hashes

Hashes for scrapy-tor-proxy-rotator-0.0.2.tar.gz
Algorithm Hash digest
SHA256 572ecb1df7ea05685ae012c82325a7e237274cd17b244219652b4f39d3de49c7
MD5 5cbacccc8412dab7f4ebb9f97e07e217
BLAKE2b-256 6e6afa50fcab6bf4dbebf905cfc6b07b872acdd7ff938882e14ca3a94e539227

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page