Skip to main content

Python SDK for Tesseract Models

Project description

Tesseract Python SDK

Build codecov This is an SDK for developing Tesseract models in Python.

Writing a Tesseract Image

To use the Tesseract SDK to create a Tesseract image, you need to implement two functions: an inference function and a get model info function. The names of these functions dont actually matter, since you need to pass both of these functions into the tesseract.serve function. However, their call signatures do matter. The inference function should take at least one dictionary as an argument along with a logger and any **kwargs. The dictionaries available to the inference function are: assets and grids. The assets dictionary contains the input data for the model, and the grids dictionary contains the grid information for the model such as timestamps and x,y coords. an example funciton signature is shown below:

def inference(assets: Dict[str, np.ndarray], grids: Dict[str, np.ndarray], logger: Logger, **kwargs) -> Dict[str, np.ndarray]:
    pass

You can also leave off grids if you dont need that information in the inference function:

def inference(assets: Dict[str, np.ndarray], logger: Logger, **kwargs) -> Dict[str, np.ndarray]:
    pass

For some examples of how to write an inference function, see the examples directory.

Test a Tesseract Image

To run tests that will ensure that your model container will run correctly in tesseract you can use the validation cli. To run with a basic setup you just need to run:

tesseract-sdk validate <image-name>:<tag>

This will look at the model info in your model code and generate a random array for input. It will then spin up the container and attempt to send data into the model. If data is returned from the model then it will validate the the shape and dtypes are correct. Thats all you need for simple models.

For more complicated models or models where you would like to test with real data you need to create a configuration file for testing. The configuration file just lets the validator know about things like where the local data to be loaded is, and which bands should be included. The resulting arrays or features will be written out to PNG and geojson respectively. An example config is shown below:

{
    "image": "my-tesseract-model:v0.0.1",
    "test_data": {
        "job_id": "my-job-id",
        "project": "my-project",
    },
    "asset_bands": [
        {
            "asset_name": "modis",
            "bands": [0,1,2,5],
        },
        {
            "asset_name": "sentinel",
            "bands": [2,4]
        }
    ],
    "args": {
        "model-arg-1": "value1",
        "model-arg-2": "value2"
    },
    "output_asset_bands": [
        {
            "asset_name": "model_output_1",
            "bands": [0, 1, 2]
        },
        {
            "asset_name": "model_output_2",
            "bands": [0]
        }
    ],
    "save_output": false
}

image: The docker image to validate.

test_data: This can either be a dictionary with a job id and project to get data directly from a Tesseract job, or path to a zarr file. If reading directly from a Tesseract job, the dict should have only the keys job_id and project. To read from a zarr file directly, pass in the path or URL as a string. This can be a local file or a remote zarr file (for example in google storage) so long as the credentials are available. Optional: if not provided, random data will be created.

asset_bands: A list of asset bands like the inputs to a Tesseract Job. Each asset_band in the list should contain the keys "asset_name" and "bands". The "asset_name" must exist in the input zarr file or Tesseract job and "bands" should be a list of integers corresponding to bands in the asset. Optional: If not provided, will use all bands from all input datsets.

args: Any arguments that need to be passed to the model inference function. Optional: If not provided no args are supplied to the model.

output_asset_bands: For each model output, the bands that should be used to output an image. This should be either 1 or 3 bands. For each item in the list, a PNG image will be created so that the model outputs can be quickly inspected to ensure that the model looks like it is working correctly. Unlike asset_bands, you can have multiple outputs here with the same name. This can be useful if you want to output several images for one asset i.e. 3 images with one band each instead of one 3 band image. Optional: If not provided, no output images will be generated.

save_output: If True, will write the model output as bytes that can be read in with numpy. Files will be named by the name of the output with a '.dat' extension. Optional: Defaults to false.

To run the validator with a configuration file, simply pass it to the utility:

tesseract-sdk validate -f valid_config.json

Contributing

To contribute to the project you must first install the package using the dev option.

pip install .[dev]

IMPORTANT: Before creating a PR make sure to update the protobuf files. The PR checks will fail if you do not. To update the protobuf files run the following commands:

make protoc-python
make copy-protos
make check-protos

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tesseract_sdk-0.9.0-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file tesseract_sdk-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: tesseract_sdk-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for tesseract_sdk-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a38766fb632f0adb40b4b8a8599cff04c78cd3dbc1577afef5a7fcf857b696ee
MD5 d83855dfe0a72f355d489ab2adc96593
BLAKE2b-256 de02fecc1a9d8baa4a267238f535393db518af9932ebdde978604534fed5b66a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page