Skip to main content

Magical tools to interact with web APIs from a data scientist's perspective.

Project description

build codecov Contributor Covenant

Apicadabri

Apicadabri is a magical set of tools to interact with web APIs from a data scientist's perspective to "just get the damn data"™.

It focuses on simplicity and speed while being agnostic about what kind of API you're calling. If you know how to send a single call to the API you're interested in, you should be good to go to scale up to 100k calls with apicadabri.

Current status

This is still an early alpha. Some basic examples already work, though (see below).

Features

  • 🚀 Get the maximum amount of speed while still playing nice with the API provider.
    • ⚙️ Configurable number of calls active at the same time (using a Semaphore).
    • 🔀 Async execution, so everything stays within one Python process.
  • 🐤 You don't have to write async or care about task scheduling anywhere.
  • 🪜 Process results right as they come in.
  • 🐛 Comprehensive error handling and retry mechanisms.*
  • 📊 Directly get a dataframe from just a single chain of method calls.*
  • 🔧 More than just HTTP: Use the abovementioned features for arbitrary (async) tasks.

*: Not yet fully implemented.

Assumptions

For now, apicadabri assumes that you want to solve a task for which the following holds:

  • All inputs fit into memory
  • All results fit into memory (you can write directly to a JSONL file)
  • The number of requests will not overwhelm the asyncio event loop (which is apparently hard to achieve anyway unless you have tens of millions of calls).
  • You want to observe and process results as they come in.
  • You want your results in the same order as the input with no gaps in between.

Future relaxing of constraints

  • For an extreme numbers of calls (>> 1M), add another layer of batching to avoid creating all asyncio tasks at the same time while also avoiding that one slow call in a batch slows down the whole task.
    • Through the same mechanism, allow loading inputs one batch at a time.

Examples

Multiple URLs

import apicadabri
pokemon = ["bulbasaur", "squirtle", "charmander"]
data = apicadabri.bulk_get(
    urls=(f"https://pokeapi.co/api/v2/pokemon/{p}" for p in pokemon),
).json().to_list()

Multiple payloads

TODO

Multivariate (zipped)

TODO

Multivariate (multiply)

TODO

Multivariate (pipeline)

TODO

Error Handling

API calls can always fail and you don't want your script with 100k API calls to crash on call number 10k because you forgot to handle a None somewhere. At the same time, though, you might not even care about errors and just want to set up a test scenario quick and dirty. Apicadabri adapts to both scenarios, by providing you three options for error handling, managed by the on_error parameter:

  • raise: The exception is not caught at all, instead it is just raised as normal and the bulk call will fail.

  • return: The exception is caught and encapsulated in an ApicadabriErrorResponse object, that also contains the input that triggered the exception.

  • A lambda function: The exception is caught and the provided error handling function is called with the triggering input and the error message and type. The error handling function must return a result of the same type as would be expected by a successful call. This can, for example, be used to return an "empty" result that does not lead to exceptions in further processing.

    ℹ️ If you need to return a different type of object in case of an error, you can instead use map with on_error="return" and then do another map that transforms the error response into the type you want.

The on_error parameter is available for multiple central methods of return objects, most notably map and reduce.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apicadabri-0.3.0.tar.gz (58.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apicadabri-0.3.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file apicadabri-0.3.0.tar.gz.

File metadata

  • Download URL: apicadabri-0.3.0.tar.gz
  • Upload date:
  • Size: 58.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for apicadabri-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b8b09a81c0b4c0a9a307020d70767f8b30bd81bb3a00fba00503afd1ed8f1956
MD5 81a1be0bec364d77a386cb9e4f50de7f
BLAKE2b-256 6ff7dbc9771cc06ce4752ff5a67348709f7e0dc8bd90ac56cd49631d3c1d5a48

See more details on using hashes here.

File details

Details for the file apicadabri-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for apicadabri-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0d71a510be31082f9a9bec04a317ca879da4bf1bcfa8643e4557ba1e1e36389
MD5 6171249fcdd04d3cf79e70065fb8432b
BLAKE2b-256 85896723120251e5fa997673a6339b75050b3c57c6f1decb34f27e41df0a391b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page