Skip to main content

Magical tools to interact with web APIs from a data scientist's perspective.

Project description

build codecov Contributor Covenant

Apicadabri

Apicadabri is a magical set of tools to interact with web APIs from a data scientist's perspective to "just get the damn data"™.

It focuses on simplicity and speed while being agnostic about what kind of API you're calling. If you know how to send a single call to the API you're interested in, you should be good to go to scale up to 100k calls with apicadabri.

Current status

This is still an early alpha. Some basic examples already work, though (see below).

Features

  • 🚀 Get the maximum amount of speed while still playing nice with the API provider.
    • ⚙️ Configurable number of calls active at the same time (using a Semaphore).
    • 🔀 Async execution, so everything stays within one Python process.
  • 🐤 You don't have to write async or care about task scheduling anywhere.
  • 🪜 Process results right as they come in.
  • 🐛 Comprehensive error handling and retry mechanisms.
  • 📊 Directly get a dataframe from just a single chain of method calls.*
  • 🔧 More than just HTTP: Use the abovementioned features for arbitrary (async) tasks.

*: Not yet fully implemented.

Assumptions

For now, apicadabri assumes that you want to solve a task for which the following holds:

  • All inputs fit into memory
  • All results fit into memory (you can write directly to a JSONL file)
  • The number of requests will not overwhelm the asyncio event loop (which is apparently hard to achieve anyway unless you have tens of millions of calls).
  • You want to observe and process results as they come in.
  • You want your results in the same order as the input with no gaps in between.

Future relaxing of constraints

  • For an extreme numbers of calls (>> 1M), add another layer of batching to avoid creating all asyncio tasks at the same time while also avoiding that one slow call in a batch slows down the whole task.
    • Through the same mechanism, allow loading inputs one batch at a time.

Examples

Multiple URLs

import apicadabri
pokemon = ["bulbasaur", "squirtle", "charmander"]
data = apicadabri.bulk_get(
    urls=(f"https://pokeapi.co/api/v2/pokemon/{p}" for p in pokemon),
).json().to_list()

Multiple payloads

TODO

Multivariate (zipped)

TODO

Multivariate (multiply)

TODO

Multivariate (pipeline)

TODO

Error Handling

API calls can always fail and you don't want your script with 100k API calls to crash on call number 10k because you forgot to handle a None somewhere. At the same time, though, you might not even care about errors and just want to set up a test scenario quick and dirty. Apicadabri adapts to both scenarios, by providing you three options for error handling, managed by the on_error parameter:

  • raise: The exception is not caught at all, instead it is just raised as normal and the bulk call will fail.

  • return: The exception is caught and encapsulated in an ApicadabriErrorResponse object, that also contains the input that triggered the exception.

  • A lambda function: The exception is caught and the provided error handling function is called with the triggering input and the error message and type. The error handling function must return a result of the same type as would be expected by a successful call. This can, for example, be used to return an "empty" result that does not lead to exceptions in further processing.

    ℹ️ If you need to return a different type of object in case of an error, you can instead use map with on_error="return" and then do another map that transforms the error response into the type you want.

The on_error parameter is available for multiple central methods of return objects, most notably map and reduce.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apicadabri-0.4.0.tar.gz (64.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apicadabri-0.4.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file apicadabri-0.4.0.tar.gz.

File metadata

  • Download URL: apicadabri-0.4.0.tar.gz
  • Upload date:
  • Size: 64.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for apicadabri-0.4.0.tar.gz
Algorithm Hash digest
SHA256 bd0120439c8aca80eb31856665988a8534119b069a81dfc78910151ad723bbda
MD5 4e7b1182bc8199ed09b9c1f9def919d9
BLAKE2b-256 ab67642d090b5d17205a70eec92b38269d7c094e7ff4244845e4db8f713cc306

See more details on using hashes here.

File details

Details for the file apicadabri-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: apicadabri-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for apicadabri-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c7136810a918dbf2e869bc080e599a1315394d8a47dea2f410d0c321eb42189
MD5 277dca38ca9b18d13d855ec2ded39103
BLAKE2b-256 903a600114ef1b9ed45d7fd21b0c10a56ce0d71d997e7e0c312e04eecd70f385

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page