Perform multiple HTTP requests concurrently – without worrying about async functions.

Project description

mure

This is a thin layer on top of aiohttp to perform multiple HTTP requests concurrently – without worrying about async functions.

mure means multiple requests, but is also the German term for a form of mass wasting involving fast-moving flow of debris and dirt that has become liquified by the addition of water.

Göscheneralp. Kolorierung des Dias durch Margrit Wehrli-Frey

(The photo was taken by Leo Wehrli and is licensed under CC BY-SA 4.0)

Installation

Install the latest stable version from PyPI:

pip install mure

Usage

Pass a list of dictionaries with at least a value for url and get a ResponseIterator with the corresponding responses. The first request is fired as soon as you access the first response:

>>> import mure
>>> from mure.dtos import Resource
>>> resources: list[Resource] = [
...     {"url": "https://httpbin.org/get"},
...     {"url": "https://httpbin.org/get", "params": {"foo": "bar"}},
...     {"url": "invalid"},
... ]
>>> responses = mure.get(resources, batch_size=2)
>>> responses
<ResponseIterator: 3/3 pending>
>>> for resource, response in zip(resources, responses):
...     print(resource, "status code:", response.status)
...
{'url': 'https://httpbin.org/get'} status code: 200
{'url': 'https://httpbin.org/get', 'params': {'foo': 'bar'}} status code: 200
{'url': 'invalid'} status code: 0
>>> responses
<ResponseIterator: 0/3 pending>

The keyword argument batch_size defines the number of requests to perform concurrently (don't be too greedy). The resources are requested batch-wise, i. e. only one batch of responses is kept in memory (depends of course also on how you use the ResponseIterator). Once you start accessing the first response of a batch, the next batch of resources is requested already in the background. So while you are doing something with a batch of responses (e.g. some CPU-heavy operation like parsing HTML), the next batch is already requested in the background.

For example, if you have four resources, set batch_size to 2 and execute:

>>> next(responses)

the first two resources are requested concurrently and blocks until both of the responses are available (i.e. if resource 1 takes 1 second and resource 2 takes 10 seconds, it blocks 10 seconds although resource 1 is already available after 1 second). Before the response of resource 1 is yielded, the next batch of resources (i.e. 3 and 4) is already requested in the background.

Executing next() a second time:

>>> next(responses)

will be super fast, because the response of resource 2 is already available (1 and 2 were in the same batch). If you are lucky, executing next() a third time will be fast as well, because the next batch of resources was already requested when you executed next(responses) the first time.

HTTP Methods

There are convenience functions for GET, POST, HEAD, PUT, PATCH and DELETE requests, for example:

>>> resources = [
...     {"url": "https://httpbin.org/post"},
...     {"url": "https://httpbin.org/post", "json": {"foo": "bar"}},
...     {"url": "invalid"},
... ]
>>> responses = mure.post(resources)

You can even mix HTTP methods in the list of resources (but have to specify the method for each resource):

>>> resources = [
...     {"method": "GET", "url": "https://httpbin.org/get"},
...     {"method": "GET", "url": "https://httpbin.org/get", "params": {"foo": "bar"}},
...     {"method": "POST", "url": "https://httpbin.org/post"},
...     {"method": "POST", "url": "https://httpbin.org/post", "json": {"foo": "bar"}},
...     {"method": "GET", "url": "invalid"},
... ]
>>> responses = mure.request(resources)

Tips

Set timeouts (e.g. 10 seconds) to avoid waiting for too long.
It might be a good idea to order the URLs to be requested by domain names so that DNS resolution is done once per domain. Once a domain's IP has been resolved, subsequent requests to the same domain can benefit from cached DNS resolutions. Also when using protocols like HTTP/1.1, connections to the same domain can be reused. Batching requests by domain can allow for connection reuse.
Shuffling URLs randomly might be an alternative if you do not request the same domain multiple times. This helps in distributing potential slow URLs across different batches.

Verbosity

Control verbosity with the MURE_LOG_ERRORS environment variable:

>>> import os
>>> import mure
>>> next(mure.get([{"url": "invalid"}]))
Response(status=0, reason='<InvalidURL invalid>', ok=False, text='')
>>> os.environ["MURE_LOG_ERRORS"] = "true"
>>> next(mure.get([{"url": "invalid"}]))
invalid
Traceback (most recent call last):
  File "/home/severin/git/mure/mure/iterator.py", line 131, in _process
    async with session.request(resource["method"], resource["url"], **kwargs) as response:
  File "/home/severin/git/mure/.env/lib/python3.11/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
                 ^^^^^^^^^^^^^^^^
  File "/home/severin/git/mure/.env/lib/python3.11/site-packages/aiohttp/client.py", line 508, in _request
    req = self._request_class(
          ^^^^^^^^^^^^^^^^^^^^
  File "/home/severin/git/mure/.env/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 305, in __init__
    self.update_host(url)
  File "/home/severin/git/mure/.env/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 364, in update_host
    raise InvalidURL(url)
aiohttp.client_exceptions.InvalidURL: invalid
Response(status=0, reason='<InvalidURL invalid>', ok=False, text='')

Project details

Release history Release notifications | RSS feed

1.4.1

Oct 7, 2024

1.4.0

Sep 3, 2024

1.3.1

Aug 30, 2024

1.3.0

Aug 30, 2024

1.2.1

Jul 19, 2024

1.2.0

Jul 18, 2024

1.1.0

Jul 18, 2024

1.0.1

Jul 17, 2024

1.0.0

Jun 27, 2024

1.0.0b7 pre-release

May 27, 2024

1.0.0b6 pre-release

May 17, 2024

1.0.0b5 pre-release

May 7, 2024

1.0.0b4 pre-release

Apr 25, 2024

1.0.0b3 pre-release

Apr 25, 2024

1.0.0b1 pre-release

Apr 25, 2024

This version

1.0.0a6 pre-release

Apr 24, 2024

1.0.0a5 pre-release

Apr 24, 2024

1.0.0a4 pre-release

Apr 24, 2024

1.0.0a3 pre-release

Apr 24, 2024

1.0.0a0 pre-release

Apr 16, 2024

0.6.1

Feb 23, 2023

0.6.0

Feb 21, 2023

0.5.1

Feb 21, 2023

0.5.0

Feb 13, 2023

0.4.3

Feb 7, 2023

0.4.2

Feb 7, 2023

0.4.1

Feb 7, 2023

0.4.0

Feb 6, 2023

0.3.1

Feb 6, 2023

0.3.0

Feb 6, 2023

0.2.0

Feb 1, 2023

0.1.0

Jan 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mure-1.0.0a6.tar.gz (16.2 kB view hashes)

Uploaded Apr 24, 2024 Source

Built Distribution

mure-1.0.0a6-py3-none-any.whl (15.0 kB view hashes)

Uploaded Apr 24, 2024 Python 3

Hashes for mure-1.0.0a6.tar.gz

Hashes for mure-1.0.0a6.tar.gz
Algorithm	Hash digest
SHA256	`5c7df5297c34ab766ff13dbff927022d02ddbb87e2e91ed81ef22b5065d76b0d`
MD5	`5dbdd532ecb241c447d21749ff265d26`
BLAKE2b-256	`ef58c6501d666cb650bf1d686bb05663e10e988a6c76ffec0dc30a9c15ea247b`

Hashes for mure-1.0.0a6-py3-none-any.whl

Hashes for mure-1.0.0a6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92ecee304b99ab8d728cb7290a0270a08e34dffa46eaeb73dc710d75f46ee680`
MD5	`d9efb8eb5ad0d024a4159c7e26119368`
BLAKE2b-256	`50c6b5acfd62f23d1a09a1c9be9cfb90e91c30bd2cfb52ba031ede5bb3bdc6ea`