Magical tools to interact with web APIs from a data scientist's perspective.
Project description
Apicadabri
Apicadabri is a magical set of tools to interact with web APIs from a data scientist's perspective to "just get the damn data"™.
It focuses on simplicity and speed while being agnostic about what kind of API you're calling. If you know how to send a single call to the API you're interested in, you should be good to go to scale up to 100k calls with apicadabri.
Current status
This is still an early alpha. Some basic examples already work, though (see below).
Features
- 🚀 Get the maximum amount of speed while still playing nice with the API provider.
- ⚙️ Configurable number of calls active at the same time (using a Semaphore).
- 🔀 Async execution, so everything stays within one Python process.
- 🐤 You don't have to write
asyncor care about task scheduling anywhere. - 🪜 Process results right as they come in.
- 🐛 Comprehensive error handling and retry mechanisms.*
- 📊 Directly get a dataframe from just a single chain of method calls.*
- 🔧 More than just HTTP: Use the abovementioned features for arbitrary (async) tasks.
*: Not yet fully implemented.
Assumptions
For now, apicadabri assumes that you want to solve a task for which the following holds:
- All inputs fit into memory
All results fit into memory(you can write directly to a JSONL file)- The number of requests will not overwhelm the asyncio event loop (which is apparently hard to achieve anyway unless you have tens of millions of calls).
- You want to observe and process results as they come in.
- You want your results in the same order as the input with no gaps in between.
Future relaxing of constraints
- For an extreme numbers of calls (>> 1M), add another layer of batching to avoid creating all asyncio tasks at the same time while also avoiding that one slow call in a batch slows down the whole task.
- Through the same mechanism, allow loading inputs one batch at a time.
Examples
Multiple URLs
import apicadabri
pokemon = ["bulbasaur", "squirtle", "charmander"]
data = apicadabri.bulk_get(
urls=(f"https://pokeapi.co/api/v2/pokemon/{p}" for p in pokemon),
).json().to_list()
Multiple payloads
TODO
Multivariate (zipped)
TODO
Multivariate (multiply)
TODO
Multivariate (pipeline)
TODO
Error Handling
API calls can always fail and you don't want your script with 100k API calls to crash on call number 10k because you forgot to handle a None somewhere.
At the same time, though, you might not even care about errors and just want to set up a test scenario quick and dirty.
Apicadabri adapts to both scenarios, by providing you three options for error handling, managed by the on_error parameter:
-
raise: The exception is not caught at all, instead it is just raised as normal and the bulk call will fail. -
return: The exception is caught and encapsulated in anApicadabriErrorResponseobject, that also contains the input that triggered the exception. -
A lambda function: The exception is caught and the provided error handling function is called with the triggering input and the error message and type. The error handling function must return a result of the same type as would be expected by a successful call. This can, for example, be used to return an "empty" result that does not lead to exceptions in further processing.
ℹ️ If you need to return a different type of object in case of an error, you can instead use
mapwithon_error="return"and then do anothermapthat transforms the error response into the type you want.
The on_error parameter is available for multiple central methods of return objects, most notably map and reduce.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apicadabri-0.3.0.tar.gz.
File metadata
- Download URL: apicadabri-0.3.0.tar.gz
- Upload date:
- Size: 58.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8b09a81c0b4c0a9a307020d70767f8b30bd81bb3a00fba00503afd1ed8f1956
|
|
| MD5 |
81a1be0bec364d77a386cb9e4f50de7f
|
|
| BLAKE2b-256 |
6ff7dbc9771cc06ce4752ff5a67348709f7e0dc8bd90ac56cd49631d3c1d5a48
|
File details
Details for the file apicadabri-0.3.0-py3-none-any.whl.
File metadata
- Download URL: apicadabri-0.3.0-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0d71a510be31082f9a9bec04a317ca879da4bf1bcfa8643e4557ba1e1e36389
|
|
| MD5 |
6171249fcdd04d3cf79e70065fb8432b
|
|
| BLAKE2b-256 |
85896723120251e5fa997673a6339b75050b3c57c6f1decb34f27e41df0a391b
|