Python SDK for Tesseract Models
Project description
Tesseract Python SDK
This is an SDK for developing Tesseract models in Python.
Writing a Tesseract Image
To use the Tesseract SDK to create a Tesseract image, you need to implement two functions: an
inference function and a get model info function. The names of these functions dont actually matter,
since you need to pass both of these functions into the tesseract.serve function. However, their
call signatures do matter. The inference function should take at least one dictionary as an argument
along with a logger and any **kwargs. The dictionaries available to the inference function are:
assets and grids. The assets dictionary contains the input data for the model, and the grids
dictionary contains the grid information for the model such as timestamps and x,y coords. an example
funciton signature is shown below:
def inference(assets: Dict[str, np.ndarray], grids: Dict[str, np.ndarray], logger: Logger, **kwargs) -> Dict[str, np.ndarray]:
pass
You can also leave off grids if you dont need that information in the inference function:
def inference(assets: Dict[str, np.ndarray], logger: Logger, **kwargs) -> Dict[str, np.ndarray]:
pass
For some examples of how to write an inference function, see the examples directory.
Test a Tesseract Image
To run tests that will ensure that your model container will run correctly in tesseract you can use the validation cli. To run with a basic setup you just need to run:
tesseract-sdk validate <image-name>:<tag>
This will look at the model info in your model code and generate a random array for input. It will then spin up the container and attempt to send data into the model. If data is returned from the model then it will validate the the shape and dtypes are correct. Thats all you need for simple models.
For more complicated models or models where you would like to test with real data you need to create a configuration file for testing. The configuration file just lets the validator know about things like where the local data to be loaded is, and which bands should be included. The resulting arrays or features will be written out to PNG and geojson respectively. An example config is shown below:
{
"image": "my-tesseract-model:v0.0.1",
"test_data": {
"job_id": "my-job-id",
"project": "my-project",
},
"asset_bands": [
{
"asset_name": "modis",
"bands": [0,1,2,5],
},
{
"asset_name": "sentinel",
"bands": [2,4]
}
],
"args": {
"model-arg-1": "value1",
"model-arg-2": "value2"
},
"output_asset_bands": [
{
"asset_name": "model_output_1",
"bands": [0, 1, 2]
},
{
"asset_name": "model_output_2",
"bands": [0]
}
],
"save_output": false
}
image: The docker image to validate.
test_data: This can either be a dictionary with a job id and project to get data directly from
a Tesseract job, or path to a zarr file. If reading directly from a Tesseract job, the dict
should have only the keys job_id and project. To read from a zarr file directly, pass in the
path or URL as a string. This can be a local file or a remote zarr file (for example in google
storage) so long as the credentials are available. Optional: if not provided, random data
will be created.
asset_bands: A list of asset bands like the inputs to a Tesseract Job. Each asset_band in the list should contain the keys "asset_name" and "bands". The "asset_name" must exist in the input zarr file or Tesseract job and "bands" should be a list of integers corresponding to bands in the asset. Optional: If not provided, will use all bands from all input datsets.
args: Any arguments that need to be passed to the model inference function. Optional: If not provided no args are supplied to the model.
output_asset_bands: For each model output, the bands that should be used to output an image.
This should be either 1 or 3 bands. For each item in the list, a PNG image will be created so
that the model outputs can be quickly inspected to ensure that the model looks like it is
working correctly. Unlike asset_bands, you can have multiple outputs here with the same name.
This can be useful if you want to output several images for one asset i.e. 3 images with one
band each instead of one 3 band image. Optional: If not provided, no output images will
be generated.
save_output: If True, will write the model output as bytes that can be read in with numpy. Files
will be named by the name of the output with a '.dat' extension. Optional: Defaults to
false.
To run the validator with a configuration file, simply pass it to the utility:
tesseract-sdk validate -f valid_config.json
Contributing
To contribute to the project you must first install the package using the dev option.
pip install .[dev]
IMPORTANT: Before creating a PR make sure to update the protobuf files. The PR checks will fail if you do not. To update the protobuf files run the following commands:
make protoc-python
make copy-protos
make check-protos
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tesseract_sdk-0.9.0-py3-none-any.whl.
File metadata
- Download URL: tesseract_sdk-0.9.0-py3-none-any.whl
- Upload date:
- Size: 33.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a38766fb632f0adb40b4b8a8599cff04c78cd3dbc1577afef5a7fcf857b696ee
|
|
| MD5 |
d83855dfe0a72f355d489ab2adc96593
|
|
| BLAKE2b-256 |
de02fecc1a9d8baa4a267238f535393db518af9932ebdde978604534fed5b66a
|