Skip to main content

No project description provided

Project description

BIA integrator API client

The example project shows the most common operations that can be performed with the API.

The generated documentation is a full reference of all the functionality in the api client.

⚠️Important⚠️

  • ⚠️ This client and some of the documentation are automatically generated, with some manual additions. This can lead to conflicting information. Please only use this Readme and the example project for information, and everything else only as a reference. This Readme aims to separate the important/less important generated docs, so if something is unclear, suggestions for improving this file are welcome.
  • ⚠️ Because of the variety of usecases to accommotate, validations focus on maintaing db structure and some consistency but focus on flexibility. Please treat users with write access as you would a root user.

Notes on using the API

Read vs write client

Read-only operations are generally public (unauthenticated) and read-write ones are private (authenticated). Two client classes exist, one for the public and one for the private modes of the API. This is mostly for editor support, since the private client includes all the public methods (and in addition to that, the write methods). This isn't done for read/write separation, so please use a single client throughout an app.

Both classes can be found here and can be used as a reference. If using these as a reference, please ignore the methods with the _with_http_info suffix.

An alternative reference for the client methods is the generated README, with methods tagged with their appropriate class.

Model hierarchy

Nested models are preferred and duplication is avoided, with exceptions where required. This results in a distinction between toplevel and nested objects.

  • Toplevel objects always have a uuid and version field, and they are one change unit most times (creating/updating objects only applies to them)
  • Nested objects never have the uuid and version fields, and are always nested in toplevel objects or other nested objects (eventually rooted in a toplevel object).

In the example project (snippet below), BIAStudy is a toplevel object, and Author is nested in BIAStudy. Authors cannot be created independently, and in order to modify an author (or any nested/toplevel object) a push-update-pull for its root toplevel object must happen. The update will only be accepted if version is incremented.

⚠️Note: version here is the object version, used to exclude concurrent writes. Type information, including version, is in the model attribute of all objects, managed by the server and should never be used or relied upon by client apps.

my_study = api_models.BIAStudy(
    uuid = study_uuid,
    version = 0,
    title = "Study title",
    description = "Study description",
    release_date = "@TODO: Check format",
    accession_id = f"accessions_must_be_unique_{study_uuid}",
    organism = "test",
    authors = [
        api_models.Author(name="Study Author 1"),
        api_models.Author(name="Study Author 2")
    ]
)

Currently, BIACollection, BIAStudy, BIAImage, FileReference are toplevel objects, with everything else being nested. Some toplevel objects refer other objects, for example the BIAImage attribute study_uuid references the uuid field of a BIAStudy object. Generally, attributes named TYPE_uuid refer the uuid field of an object of that type.

Batch operations

Bulk creation is supported for objects of type BIAImage create_images and FileReference create_file_references.

These endpoint always respond with a 201 status to avoid generated clients raising an exception, and return a BulkOperationResponse object with the actual result for each item written. Individual writes are atomic so if the BulkOperationResponseItem for a particular object has a 201 status, then it was written to the database, but the operation as a wole is not atomic. Some items might have been written and some might have failed, and the client must explicitly check if all items was written, and either do a partial or full retry (operations here are are idempotent, provided the documents being written are identical).

The item_idx_by_status attribute of BulkOperationResponse is a dictionary mapping the operation status (either 201 or 400) to the index of the document in the list passed to create_images (or create_file_references).

Please see the example script for an example before using this.

Environments

Development is at https://bia-cron-1.ebi.ac.uk:8080/api/v1 available within the EBI network. User accounts are needed for write access. Read-only access is not authenticated.

To check the connection, install biaint using the project's readme and list the available studies.

UUIDs

By design, models that require a UUID field expect them to be provided by the client generating the object. It is recommended that the UUIDs be deterministic, based on some important properties of the object being created.

For example, if FileReferences are created for files on a filesystem, the UUIDs could be derived from a mix of the absolute path of the files, and the file size (or its last modification time). This makes it easy to avoid duplicating a file if the operation of creating a large number of files fails halfway through, since the corresponding FileReferences would have the same UUID. Also, any stable identifier for the object being created can be used (e.g. the id in a legacy database)

In practise, this often looks similar to:

object_stable_attributes = {
    'stable_attribute_1': attr_1,
    'stable_attribute_2': attr_2
}
hash_input = json.dumps(object_stable_attributes, sort_keys=True)
hexdigest = hashlib.md5(hash_input.encode("utf-8")).hexdigest()
image_id_as_uuid = uuid.UUID(version=4, hex=hexdigest)

Write operations

There are two important things to consider when creating or modifying objects:

  1. Toplevel objects are versioned, and the versions need to be consecutive. When creating an object, its version must be set to 0, and then when modifications are made the version needs to be incremented. Gotcha: Providing an incorrect version when updating currently results in a "Not found" error instead of a conflict.
  2. Object creation and modification are idempotent. This is to simplify retries if deterministic UUIDs were used, because the objects that were created in the previous run are ignored.

Basic search

The API supports some basic search operations, aimed at simple usecases like "Which images in this study have representations bigger than 10TB?", "Which images have thumbnails", etc. Keep in mind that:

  • Queries are at the Image/Study/FileReference level, and they return the entire toplevel object. For example, if looking for all thumbnails in a study, a query for ImageRepresentions of type "thumbnail" would return *all BIAImage objects which have at least one thumbnail ImageRepresentation". The actual thumbnail ImageRepresentation needs to be extracted separately.
  • Although not required, queries for Images/FileReferences should always include the study uuid, to limit the search space. Queries currently timeout after 2 seconds.

In the API client, methods with names like search_* can be used for searching. See https://bia-cron-1.ebi.ac.uk:8080/redoc#tag/public/operation/search_images_exact_match an below for a reference of possible filters.

Example body for a raw HTTP post. Client arguments are usually wrapped in *Search types. See the search example for the api client equivalent.

{
    "annotations_any": [{"dimension_order": "XYZCT"} ],
    "image_representations_any": [{"type": "thumbnail", "size_lte": 1000000000} ],
    "study_uuid": "00000000-0000-0000-0006-09b5dbf57bdf",
    "limit": 10
}

Setup

poetry add bia-integrator-api git+https://github.com/BioImage-Archive/bia-integrator.git@biaint-api-backend#subdirectory=clients/python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bia_integrator_api-0.2.0.tar.gz (72.5 kB view hashes)

Uploaded Source

Built Distribution

bia_integrator_api-0.2.0-py3-none-any.whl (158.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page