This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description
Django-ProxyList-For-Grab
=========================

.. image:: https://api.travis-ci.org/gotlium/django-proxylist.png?branch=master
:alt: Build Status
:target: https://travis-ci.org/gotlium/django-proxylist
.. image:: https://coveralls.io/repos/gotlium/django-proxylist/badge.png?branch=master
:target: https://coveralls.io/r/gotlium/django-proxylist?branch=master
.. image:: https://pypip.in/v/django-proxylist-for-grab/badge.png
:alt: Current version on PyPi
:target: https://crate.io/packages/django-proxylist-for-grab/
.. image:: https://pypip.in/d/django-proxylist-for-grab/badge.png
:alt: Downloads from PyPi
:target: https://crate.io/packages/django-proxylist-for-grab/


This application is useful for keep an updated list of proxy servers, it
contains everything you need to make periodic checks to verify the properties
of the proxies. Also you can periodically collect the proxy server
from the Internet, remove broken and slow proxies.



Installing the package
----------------------

`django-proxylist-for-grab` can be easily installed using pip:

.. code-block:: bash

$ pip install django-proxylist-for-grab



Configuration
-------------

After that you need to include `django-proxylist-for-grab` into your
*INSTALLED_APPS* list of your django settings file.

.. code-block:: python

INSTALLED_APPS = (
...
'proxylist',
...
)

Add `django-proxylist-for-grab` into ``urls.py``

.. code-block:: python

urlpatterns = patterns(
...
url(r'', include('proxylist.urls')),
...
)


`django-proxylist-for-grab` has a list of variables that you can configure
throught django's settings file. You can see the entire list at
Advanced Configuration.



Database creation
-----------------

You have two choices here:

Using south
~~~~~~~~~~~

We ancourage recommend you using `south` for your database migrations. If you
already use it you can migrate `django-proxylist-for-grab`:

.. code-block:: bash

$ python manage.py migrate proxylist



Using syncdb
~~~~~~~~~~~~

If you don't want to use `south` you can make a plain *syncdb*:

.. code-block:: bash

$ python manage.py syncdb



Basic setup
-----------

At first, add a mirror. For working mirror, you need to install app on
server with external ip. This is in order to be able to verify the correctness
of data through proxy server. After adding mirror, you can add and test
your proxies.



Asynchronously checking
-----------------------
`django-proxylist-for-grab` has configured by default to non-async check.
You can change this behavior. Insert into your django settings
``PROXY_LIST_USE_CALLERY`` and change it to True.

After you need to install and configure django-celery and rabbit-mq.

For example on OS X
~~~~~~~~~~~~~~~~~~~
**Packages installation**

.. code-block:: bash

$ sudo pip install django-celery
$ sudo port install rabbitmq-server

Add the 'djcelery' application to 'INSTALLED_APPS' in settings

.. code-block:: python

INSTALLED_APPS = (
...
'djcelery',
...
)

**Sync database**

.. code-block:: bash

$ ./manage.py syncdb

**Run rabbitmq and celery**

.. code-block:: bash

$ sudo rabbitmq-server -detached
$ nohup python manage.py celery worker >& /dev/null &



Command line reference
----------------------

update_proxies
~~~~~~~~~~~~~~

Add new proxies from a file.

.. code-block:: bash

$ python manage.py update_proxies [file1] <file2> <...>


check_proxies
~~~~~~~~~~~~~

Check proxies availability and anonymity.

.. code-block:: bash

$ python manage.py check_proxies


grab_proxies
~~~~~~~~~~~~

Search proxy list on internet


.. code-block:: bash

$ python manage.py grab_proxies


clean_proxies
~~~~~~~~~~~~~

Remove broken proxies


.. code-block:: bash

$ python manage.py clean_proxies



GrabLib usage example:
----------------------

.. code-block:: python

from proxylist import grabber

grab = grabber.Grab()

# Get your ip (You can do this a few times to see how the proxy will be changed)
grab.go('http://ifconfig.me/ip')
if grab.response.code == 200:
print grab.response.body.strip()

# Get count of div on google page
grab.go('http://www.ya.ru/')
if grab.response.code == 200:
print grab.doc.select('//script').number()




GrabLib Spider example:
----------------------

.. code-block:: python

# filename: apps/app/management/commands/spider.py
# usage: python manage.py spider
from django.core.management.base import BaseCommand
from grab.spider.base import Task
from proxylist.grabber import Spider


class SimpleSpider(Spider):
initial_urls = ['http://www.lib.ru/']

def task_initial(self, grab, task):
grab.set_input('Search', 'linux')
grab.submit(make_request=False)
yield Task('search', grab=grab)

def task_search(self, grab, task):
if grab.doc.select('//b/a/font/b').exists():
for elem in grab.doc.select('//b/a/font/b/text()'):
print elem.text()


class Command(BaseCommand):
help = 'Simple Spider'

def handle(self, *args, **options):
bot = SimpleSpider()
bot.run()
print bot.render_stats()



* GitHub: https://github.com/gotlium/django-proxylist


.. image:: https://d2weczhvl823v0.cloudfront.net/gotlium/django-proxylist/trend.png
:alt: Bitdeli badge
:target: https://bitdeli.com/free
Release History

Release History

0.5.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.4.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.4.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.4.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.4.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
django_proxylist_for_grab-0.5.1-py2-none-any.whl (113.8 kB) Copy SHA256 Checksum SHA256 2.7 Wheel Dec 16, 2014
django-proxylist-for-grab-0.5.1.tar.gz (94.9 kB) Copy SHA256 Checksum SHA256 Source Dec 16, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting