Skip to main content

a request obfuscator and web scraping toolkit

Project description

mosquito

a request obfuscator and web scraping toolkit

mosquito        \             /
                 \     |     /
                 /   \ | /   \
                 \    \|/    /
                  \,  o^o  ,/
                    \,/"\,/
            ,,,,----,{/X\},----,,,,
   ,,---''''      _-'{\X/}'-_      ''''---,,
 /'            ,-'/   \V/   \'-,            '\
(        ,--''/   |   (_)   |   \''--,        )
 '--,,-''    |    |   /_\   |   |     ''-,,--'
            /'    |  (_-_)  |   '\
           /     /'   \_/   '\    \
          /     /     (_)     \    \
               /       V       \
              /                 \
             /                   \             

mosquito gives you an API similar to requests and in fact uses it internally. Each HTTP request exposes a number of information such as user agent or IP address that allows a server to identify you or your application. mosquito let's you set up multiple identities and schedules your requests to them. Each identity may consist of a whole bunch of attributes that are supported by requests's session object e.g.: headers, proxies or cookies. To list all attributes available execute mosquito.available_attributes().

Installation

from PyPI

pip install mosquito

Usage

demo/demo.py

#!/usr/bin/env python3
# Standard library modules.

# Third party modules.

# Local modules
import mosquito
from mosquito_test import httpbin

# Globals and constants variables.


# Register attribute callback using a decorator ...
@mosquito.attribute('headers')
def headers():
    for name in ('linux', 'mac', 'windows'):
        yield {'user-agent': name}


# ... or register attributes by hand.
mosquito.register_attributes(delay=.0, params=[{'foo': 42}, {'bar': 13, 'baz': 37}])


# Let's list all available attributes.
print('available:', mosquito.available_attributes())


with mosquito.swarm(repeat_on=(503,), max_attempts=3) as scheduler:
    # Note that the swarm uses 2 sessions only, determined by the minimum length of passed
    # attributes which is `params` in our case.
    print(f'swarm uses {len(scheduler.swarm)} sessions')

    for i in range(5):
        # `swarm wraps` requests' api and therefore supports get, post, put etc.
        # parameters passed directly to request method will overwrite such registered before
        result = scheduler.get(httpbin('/user-agent'), params=dict(bar=0))
        print(i, result.url, result.json())

    # Let's provoke an error ...
    try:
        scheduler.get(httpbin('/status/404'))

    except mosquito.MosquitoError as mre:
        print(mre)

    # ... and another one, being more obstinate this time
    try:
        scheduler.get(httpbin('/status/503'))

    except mosquito.MosquitoError as mre:
        print(mre)

Testing

Some unit tests require a httpbin instance which is httpbin.org by default. For sake of speed and reliability it's recommended to run your own instance e.g. using the docker image

# run httpbin server using docker
docker run -p 8080:80 kennethreitz/httpbin

# let mosquito know its location by setting an environment variable
export HTTPBIN_BASE_URL=http://localhost:8080 

on your test machine. The actual test is ran by:

python -m mosquito_test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosquito-0.1.0.tar.gz (16.3 kB view hashes)

Uploaded Source

Built Distribution

mosquito-0.1.0-py3-none-any.whl (19.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page