Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

a request obfuscator and web scraping toolkit

Project description

mosquito

a request obfuscator and web scraping toolkit

mosquito gives you an API similar to requests and in fact uses it internally. Each HTTP request exposes a number of information such as user agent or IP address that allows a server to identify you or your application. mosquito let's you set up multiple identities and schedules your requests to them. Each identity may consist of a whole bunch of attributes that are supported by requests's session object e.g.: headers, proxies or cookies. To list all attributes available execute mosquito.available_attributes().

Installation

from PyPI

pip install mosquito

Usage

demo/demo.py

#!/usr/bin/env python3
# Standard library modules.

# Third party modules.

# Local modules
import mosquito
from mosquito_test import httpbin

# Globals and constants variables.


# Register attribute callback using a decorator ...
@mosquito.attribute('headers')
def headers():
    for name in ('linux', 'mac', 'windows'):
        yield {'user-agent': name}


# ... or register attributes by hand.
mosquito.register_attributes(delay=.0, params=[{'foo': 42}, {'bar': 13, 'baz': 37}])


# Let's list all available attributes.
print('available:', mosquito.available_attributes())


with mosquito.swarm(repeat_on=(503,), max_attempts=3) as scheduler:
    # Note that the swarm uses 2 sessions only, determined by the minimum length of passed
    # attributes which is `params` in our case.
    print(f'swarm uses {len(scheduler.swarm)} sessions')

    for i in range(5):
        # `swarm wraps` requests' api and therefore supports get, post, put etc.
        # parameters passed directly to request method will overwrite such registered before
        result = scheduler.get(httpbin('/user-agent'), params=dict(bar=0))
        print(i, result.url, result.json())

    # Let's provoke an error ...
    try:
        scheduler.get(httpbin('/status/404'))

    except mosquito.MosquitoError as mre:
        print(mre)

    # ... and another one, being more obstinate this time
    try:
        scheduler.get(httpbin('/status/503'))

    except mosquito.MosquitoError as mre:
        print(mre)

Testing

Some unit tests require a httpbin instance which is httpbin.org by default. For sake of speed and reliability it's recommended to run your own instance using the docker image. Check hub.docker.com/r/kennethreitz/httpbin for more information.

# run httpbin server using docker
docker run -p 8080:80 kennethreitz/httpbin

# let mosquito know its location by setting an environment variable
export HTTPBIN_BASE_URL=http://localhost:8080 

The actual test is ran by:

python -m mosquito_test

Feedback

For feedback of any kind write an issue at gitlab.com. Thank you for using mosquito.

mosquito        \             /
                 \     |     /
                 /   \ | /   \
                 \    \|/    /
                  \,  o^o  ,/
                    \,/"\,/
            ,,,,----,{/X\},----,,,,
   ,,---''''      _-'{\X/}'-_      ''''---,,
 /'            ,-'/   \V/   \'-,            '\
(        ,--''/   |   (_)   |   \''--,        )
 '--,,-''    |    |   /_\   |   |     ''-,,--'
            /'    |  (_-_)  |   '\
           /     /'   \_/   '\    \
          /     /     (_)     \    \
               /       V       \
              /                 \
             /                   \             

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for mosquito, version 0.2.0
Filename, size File type Python version Upload date Hashes
Filename, size mosquito-0.2.0-py3-none-any.whl (20.4 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size mosquito-0.2.0.tar.gz (16.7 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page