Skip to main content

Simple Python multithreaded proxy manager, focused on scrapes

Project description

SimpleProxyManager

Python multithreaded proxy manager, focused on scrapes

This project lives at https://github.com/amagnasco/SimpleProxyManager. Feel free to submit an issue or improvements! Releases: https://pypi.org/project/SimpleProxyManager/

Warning: if using on macOS, don't use this if also using os.fork() due to dependency urllib.request

For an example implementation, check out dev/example.py.

Major dependencies: requests, urllib3.

Constructor inputs:

  • threads: number of processing threads to use
  • wait: minimum and maximum time to wait between requests, and HTTP timeout. All in seconds.
  • headers: HTTP headers to use (user agent, accept, and accept-language)
  • test: URI to use for health check, and minimum and maximum time to wait between each proxy healthcheck (in seconds)

Public API:

  • Setup:
    • load: inputs a path to a list of proxies, ingests and tests them.
  • Monitoring:
    • healthcheck: processes the "all" queue into "ready" or "broken" queues
    • available: returns the list of available proxies
    • broken: returns the list of broken proxies
  • Use:
    • validate: inputs a URI, and runs it through urllib's parse
    • req: inputs a URI, validates it, assigns a proxy, and runs get. Returns {success: True, data: Response}, or {success: False, error: Exception}.
    • get: inputs a proxy and URI, and retrieves it. For advanced usage like externally queued/threaded/async'd setups.

Version History

This project uses semantic versioning.

  • to-do:

    • improve usage docs
    • improve error handling
    • add test cases
    • improve HTTP status code handling
    • differentiate input proxy list by http/https
    • add a "reset queues to all" method
    • improve manual exit handling
    • abstract proxy assigner method from req
    • improve input and type checking
  • 0.2.0 (in development):

    • add i18n
    • consolidate load conf into one dict/json
    • improve input and type checking
    • improve exit condition when 0 proxies available
  • 0.1.2:

    • handle HTTPErrors into general exceptions
    • simplify class name and init for cleaner import
    • updated example
  • 0.1.1:

    • added GitHub > PyPi publication workflow
  • 0.1.0:

    • First release! Functional enough to share, but some logs might still be in Spanish while I sort out the i18n.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simpleproxymanager-0.1.2.tar.gz (42.4 kB view hashes)

Uploaded Source

Built Distribution

SimpleProxyManager-0.1.2-py3-none-any.whl (29.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page