Skip to main content

HTTP library with thread-safe connection pooling, file post, and more.

Project description

Highlights

  • Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).

  • File posting (encode_multipart_formdata).

  • Built-in redirection and retries (optional).

  • Supports gzip and deflate decoding.

  • Thread-safe and sanity-safe.

  • Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.

What’s wrong with urllib and urllib2?

There are two critical features missing from the Python standard library: Connection re-using/pooling and file posting. It’s not terribly hard to implement these yourself, but it’s much easier to use a module that already did the work for you.

The Python standard libraries urllib and urllib2 have little to do with each other. They were designed to be independent and standalone, each solving a different scope of problems, and urllib3 follows in a similar vein.

Why do I want to reuse connections?

Performance. When you normally do a urllib call, a separate socket connection is created with each request. By reusing existing sockets (supported since HTTP 1.1), the requests will take up less resources on the server’s end, and also provide a faster response time at the client’s end. With some simple benchmarks (see test/benchmark.py ), downloading 15 URLs from google.com is about twice as fast when using HTTPConnectionPool (which uses 1 connection) than using plain urllib (which uses 15 connections).

This library is perfect for:

  • Talking to an API

  • Crawling a website

  • Any situation where being able to post files, handle redirection, and retrying is useful. It’s relatively lightweight, so it can be used for anything!

Examples

Go to urllib3.readthedocs.org for more nice syntax-highlighted examples.

But, long story short:

import urllib3

http = urllib3.PoolManager()

r = http.request('GET', 'http://google.com/')

print r.status, r.data

The PoolManager will take care of reusing connections for you whenever you request the same host. For more fine-grained control of your connection pools, you should look at ConnectionPool.

Run the tests

Running the test command will install the necessary dependencies and run the tests.

$ python setup.py test
...................................

Success! Tests can also be run using nosetests for cleaner output.

Contributing

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug. There is a Contributor Friendly tag for issues that should be ideal for people who are not very familiar with the codebase yet.

  2. Fork the urllib3 repository on Github to start making your changes.

  3. Write a test which shows that the bug was fixed or that the feature works as expected.

  4. Send a pull request and bug the maintainer until it gets merged and published. :) Make sure to add yourself to CONTRIBUTORS.txt.

Changes

  • Fixed typo in VerifiedHTTPSConnection which would only present as a bug if you’re using the object manually. (Thanks pyos)

1.0.1 (2011-10-10)

  • Fixed a bug where the same connection would get returned into the pool twice, causing extraneous “HttpConnectionPool is full” log warnings.

1.0 (2011-10-08)

  • Added PoolManager with LRU expiration of connections (tested and documented).

  • Added ProxyManager (needs tests, docs, and confirmation that it works with HTTPS proxies).

  • Added optional partial-read support for responses when preload_content=False. You can now make requests and just read the headers without loading the content.

  • Made response decoding optional (default on, same as before).

  • Added optional explicit boundary string for encode_multipart_formdata.

  • Convenience request methods are now inherited from RequestMethods. Old helpers like get_url and post_url should be abandoned in favour of the new request(method, url, ...).

  • Refactored code to be even more decoupled, reusable, and extendable.

  • License header added to .py files.

  • Embiggened the documentation: Lots of Sphinx-friendly docstrings in the code and docs in docs/ and on urllib3.readthedocs.org.

  • Embettered all the things!

  • Started writing this file.

0.4.1 (2011-07-17)

  • Minor bug fixes, code cleanup.

0.4 (2011-03-01)

  • Better unicode support.

  • Added VerifiedHTTPSConnection.

  • Added NTLMConnectionPool in contrib.

  • Minor improvements.

0.3.1 (2010-07-13)

  • Added assert_host_name optional parameter. Now compatible with proxies.

0.3 (2009-12-10)

  • Added HTTPS support.

  • Minor bug fixes.

  • Refactored, broken backwards compatibility with 0.2.

  • API to be treated as stable from this version forward.

0.2 (2008-11-17)

  • Added unit tests.

  • Bug fixes.

0.1 (2008-11-16)

  • First release.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urllib3-1.0.1.tar.gz (23.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page