This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

This module provides an easy-to-use interface to allow you to run multiple CURL url fetches in parallel in Python, without threads.

To test it, go to the command line, cd to this folder and run

./test.py

This should run 100 searches through Google’s API, printing the results. To see what sort of performance difference running parallel requests gets you, try altering the default of 10 requests running in parallel using the optional script argument, and timing how long each takes:

time ./test.py 1 time ./test.py 20

The first only allows one request to run at once, serializing the calls. I see this taking around 100 seconds. The second run has 20 in flight at a time, and takes 11 seconds! Be warned though, it’s possible to overwhelm your target if you fire too many requests at once. You may end up with your IP banned from accessing that server, or hit other API limits.

The class is designed to make it easy to run multiple curl requests in parallel, rather than waiting for each one to finish before starting the next. Under the hood it uses curl_multi_exec but since I find that interface painfully confusing, I wanted one that corresponded to the tasks that I wanted to run.

To use it, easy_install pycurl, import pyparallelcurl, then create the ParallelCurl object:

parallelcurl = ParallelCurl(10)

The first argument to the constructor is the maximum number of outstanding fetches to allow before blocking to wait for one to finish. You can change this later using setmaxrequests() The second optional argument is an array of curl options in the format used by curl_setopt_array()

Next, start a URL fetch:

parallelcurl.startRequest(‘http://example.com’, on_request_done, {‘somekey’:’somevalue’})

The first argument is the address that should be fetched The second is the callback function that will be run once the request is done The third is a ‘cookie’, that can contain arbitrary data to be passed to the callback

This startRequest call will return immediately, as long as less than the maximum number of requests are outstanding. Once the request is done, the callback function will be called, eg:

on_request_done(content, ‘http://example.com’, ch, {‘somekey’:’somevalue’})

The callback should take four arguments. The first is a string containing the content found at the URL. The second is the original URL requested, the third is the curl handle of the request that can be queried to get the results, and the fourth is the arbitrary ‘cookie’ value that you associated with this object. This cookie contains user-defined data.

Since you may have requests outstanding at the end of your script, you MUST call

parallelcurl.finishallrequests()

before you exit. If you don’t, the final requests may be left unprocessed! This is actually also called in the destructor of the class, but it’s definitely best practice to call this explictly.

By Pete Warden <pete@petewarden.com>, freely reusable, see http://petewarden.typepad.com for more

Release History

Release History

0.0.4

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
pyparallelcurl-0.0.4-py2.6.egg (10.7 kB) Copy SHA256 Checksum SHA256 2.6 Egg Oct 14, 2010

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting