Implements a priority-queue-based fetching system.
The Fetcher class can be used by passing a fetch function which accepts locations of an arbitrary type.
Locations can then be added via the fetcher’s add method.
It is meant to manage fetches by interrupting timed-out calls to fetch and by re-entering failed fetches back into the queue with an altered priority.
Fetches which fail, either due to timeout or by virtue of being filtered out by a passed success function, are routed back into the source queue with their priority altered by another passed function.
Succeeding fetches are passed on to the results queue.
The fetch function may be any function which accepts a single parameter: thus the scope of tasks it may perform is relatively unrestricted.
The Fetcher constructor also accepts a numeric threadcount argument, which will determine the number of concurrent fetch functions to run.
Each fetch call will be made in a newly spawned thread. This is done as a relatively simple way to allow interruption of the call using the C PyThreadState_SetAsyncExc function.
Because of its use of CPython’s underlying C API, this module is not portable to other Python implementations.
Due to deficiencies in Python’s threading facilities which the author has been unable to work around, problems may result if the passed fetch blocks on I/O.
An attempt has been made to work around these issues; see the reactor function for details.
Despite this attempt, the code is still markedly sketchy. Consider yourself warned.
Of related note is the method test.FetcherTester.test_incorrect_fission. This test highlights the fragility of the I/O block workaround. See the documentation for that method for more information.
pqueue_fetcher is licensed under the MIT license.
License details are provided in the file COPYING.
Although the module does pass its fairly simplistic test suite, this is no guarantee of usability.
In particular, the module hasn’t been used to do anything other than pass its test suite.
Developers in search of a robust solution for multithreaded document retrieval are well advised to look elsewhere, probably starting with the twisted framework.
pqueue_fetcher was undertaken more as an exercise in multithreading and the use of queues than for the sake of creating a usable body of code.
It is being published mostly as an example of these techniques, and for the sake of highlighting some of the difficulties in writing threaded Python code.
The module is also missing some obvious bits of functionality, such as the ability to set a retry limit.
Should work continue on pqueue_fetcher, it may prove more useful to allow the timeout logic to be handled by the fetch functions.
TODO: Figure out how to actually get changelog content.
Changelog content for this version goes here.