Skip to main content

Underwater Vision Profiler Embedded Classifier

Project description

Underwater Vision Profiler Embedded Classifier

Toolbox to train automatic classification models for UVP6 images and/or to evaluate their performances.

Minimal knowledge in python, git and machine learning is needed.

This toolbox has been tested on MacOS and Linux (e.g. Ubuntu 20.04/22.04 and Mint 21). We do not garantee it will work on Windows.

Installation

To install the package, you can type the following command in your terminal:

python -m pip install git+https://github.com/ecotaxa/uvpec

or

python -m pip install git+ssh://git@github.com/ecotaxa/uvpec.git

or using pip

pip install uvpec

uvpec should now appear if you type pip list | grep uvpec.

Clone the repository

For development purposes, you can also clone the repository locally. For this, you can either run (for HTTPS)

git clone https://github.com/ecotaxa/uvpec.git

or (for SSH)

git clone git@github.com:ecotaxa/uvpec.git

Configuration and use of the package

In order to use the package, you have to create a config.yaml file. Don't panic, you have an example of such a file in your cloned repository in uvpec/uvpec/config.yaml. In the latter, you need to specify 3 things : (1) what you want to do with the package, (2) some input/output information and (3) parameters for the gradient boosted trees algorithm (XGBoost) that will train and create a classification model.

For the process information, you need to specify two boolean variables:

  • evaluate_only: true if you only want to evaluate an already created model. In that case, the package will not train any model and will do only the evaluation of the model indicated by the model path with the test_features_file data. false if you want to train a model.
  • train_only: true if you want to only train a model and skip the evaluation part. false if not. Not taken into account if evaluate_only is true.

For the input/ouput (io), you need to specify:

  • output_dir: an output directory, where the model and related information will be exported.
  • train_images_dir: an image directory for the training set images. The plankton and/or particle images must be sorted by taxonomic classes into subfolders. It is standardized to be used with Ecotaxa. Each subfolder is named by the class's display name, and the ecotaxa ID, separated by two "_", and contains images from only its taxonomic class : 'DisplayName__EcotaxaID'. The typical way to export data from ecotaxa in such folders organization is to make a D.O.I. export, exporting all images and keep only 'white on black' images = *_100.png (see here). The maximum number of accepted classes is 40.
  • test_images_dir: an image directory for the test set images. It will only be used if you evaluate a model (training + evaluation or evaluation only).
  • training_features_file: the name of your training features file. If it does not already exist, it will be created automatically so give it a great name !
  • test_features_file: the name of your test features file. If it does not already exist, it will be created automatically so give it a great name as well ! Unused if train_onlyis true.
  • model: the path to a model (the format of the file should be Muvpec_KEY.model, a model created using XGBoost). Only used for evaluation_only.
  • objid_threshold_file: the path to a tsv file containing the objid and the UVP6 acquisition threshold of each image for which features will be extracted. Only used if use_objid_threshold_file is set to true.

For the instrument parameter, you need to specify:

  • The pixel threshold of your UVP6 uvp_pixel_threshold, that is the threshold value used to split image pixels into foreground (> threshold) and background (<= threshold) pixels. It is usually comprised between 20 and 22.
  • If you wish to use a variable threshold value (e.g. if you are working with images acquired with different UVP6 instruments), set use_objid_threshold_file to true.

Then, for XGBoost parameters of the training, you need to specify:

  • An initialization seed random_state. It is important if you build multiple models with a different XGBoost configurations. The number is not important, you can keep 42.
  • A number of CPU cores n_jobs that will depend on the computational power of your machine or server.
  • The learning rate. It controls the magnitude of adjustements made to the model's parameters during each iteration of training (i.e. in our model, at each boosting round). A high learning rate may cause the optimization to miss the optimal parameter values (e.g. it leads to oscillations or divergence) while a low learning rate might lead to a slow training due to a slow convergence to the minimum of the loss function or it can also get stuck in local minima.
  • The maximum depth of a tree max_depth. For technical reasons, it is forbidden to go above 7.
  • weight_sensitivity represents the weight ($w$) you want to put on biological classes during training. The minimum value is 0 (i.e. no weight) and the maximum value is 1. It is useful to add a weight to smaller classes because a great number (often $\ge$ 80%) of images from the training set are detritus hence putting $w$ to 0.25 will put more weight on small (biological) classes during training and will force the algorithm to pay more attention to those classes.
  • detritus_subsampling can be used if you want to undersample the detritus class in your training. If you think that your detritus class (therefore, you must have one specifically named 'detritus') is too populated (e.g. extreme dataset imbalance) and that removing a part of it is not an issue for your application, then you can fix a given percentage of subsampling for that class. For example, a subsampling_percentage of 20 means that you only keep 20% of your entire detritus class. Keep detritus_subsampling to false if you don't want to use it.
  • subsampling_percentage is the percentage of images of 'detritus' from your training set you want to keep for training.
  • num_trees_CV stands for the number of boosting rounds you want to use for the cross-validation (CV). This is equivalent to the parameter num_round in XGBoost.

You will also notice that there is one last thing. use_C gives the possibility to extract the features from images using a C++ extension. We advise to keep it to true because it is much faster than the python version.

Once you are done, run uvpec config.yaml in your terminal and wait for the magic to happen ! You should get everything you need in the output folder you specified.

Test the package

We have prepared a test folder in our package. This allows you to check if the pipeline works without launching a full process that will take a significant amount of time. It is always a good idea to check if everything works well before using it on a full training set and also after some package updates. To use it, navigate in the test folder using cd test then run uvpec config.yaml. You should see something going on in your terminal. Don't forget to check your output folder now !

In addition, there is also another test that you can run in order to see if the pipeline is not broken somewhere. For that, run pytest (that actually looks for test_uvpec.py) in your terminal. Everything should now be taken care of and if you only see green lights it means that all tests went smoothly! If not, that means something went wrong and the error messages can help you find where the leak is.

Just a reminder, if you see some errors during the test, check if you did not forget to run uvpec config.yaml. pytest is not automatically present on your laptop. To install it, type pip install --user pytest in your terminal.

How to prepare your dataset from an Ecotaxa project

You can refer to the documentation on Ecotaxa to download all the vignettes you need to use for your training and/or test set. See the "export project" part of your project on https://ecotaxa.obs-vlfr.fr/.

Ecotaxa is built with a rest API that has been designed to facilitate the work of users. Two packages have been developped to interact more easily with the API in python and in R. Be careful to download the vignettes with the black background because every object is stored in two versions: one with a white backgroud and one with a black background. You will also need to remove the size legend at the bottom of each vignette. To do so, crop 31 pixel at the bottom of the vignette.

Finally, just rename the vignettes with the uvpec standard (i.e. DisplayName__EcotaxaID), and you are good to go !

Uninstalling or updating the package

To uninstall our (awesome-why-are-you-removing-it) package, type pip uninstall uvpec in your terminal.

For updates, either uninstall it and reinstall it with the HTPPS or SSH version, or more simply using pip.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uvpec-1.0.0.tar.gz (118.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

uvpec-1.0.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77.5 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

uvpec-1.0.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77.5 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

uvpec-1.0.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (76.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

uvpec-1.0.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (76.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

uvpec-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (349.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

uvpec-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (343.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

uvpec-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (329.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

uvpec-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (329.5 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

uvpec-1.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (330.7 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

uvpec-1.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (317.5 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

uvpec-1.0.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291.8 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file uvpec-1.0.0.tar.gz.

File metadata

  • Download URL: uvpec-1.0.0.tar.gz
  • Upload date:
  • Size: 118.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.28.2 setuptools/45.2.0 requests-toolbelt/1.0.0 tqdm/4.64.1 CPython/3.8.10

File hashes

Hashes for uvpec-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6d18ddb42138657d01efe095c265f64f6c2ea19b09c144e7bbd816d140787967
MD5 e2200059542d174975d9b2583ef7284d
BLAKE2b-256 073680fe02e67a45e3c014c460424a07c8aa9a13c71de22dbd1f1e080bc1d4bd

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cb3b2b02af1946f06ffeab4e081f22a9256747d207a359caa33a958d3ac526a3
MD5 933420294771aa816484bf4f0c1c58a4
BLAKE2b-256 d85619502402f37b927a6aba274e4eb56f5d8545bd6e4f4215ee6d88b7d00f6a

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 529b153d6ff01917cee0f32534079850f79a5741515c5ed7166ec8ede9400b32
MD5 6ef3a9306c95f4d5d172f961f167191f
BLAKE2b-256 fb30e79532fa020cc748d0655d34493b86df22cb9e10d6096c3b633290587f57

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 84674bd4575ca632c0d7073bc678490e623967c52fa2ebe8854225bdbff1e9f5
MD5 1b96d50be8cf5e2e08f4d6cdda19107a
BLAKE2b-256 0f47fdfa3baa32eba9776e3b995b64c86586c249ce1d6519de92f1169bfe4c65

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2adf6cb1dad86fc06d226e2b3ff52c9f5199145ac85105f0ac5aa94fdf5de353
MD5 1c8647a1529a133cde80949200502e2c
BLAKE2b-256 e8ab7a98aab64b29d1f874283e6798faf33ea4277a5bb720f1da0a38191d58b0

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b29b3403e3c09dafbbe6b99d12ec474d8f0c0983eb869a27a30fc5e50d2bc1ea
MD5 842bf05f0d8786266083030c8f336e40
BLAKE2b-256 eb42779ce2115ad45766bd3627ccaa2cca45f350ce0ed9602be146ad5a964092

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 684b769caeaa0f3672102b806daf491307ed0d0de9007aac3663cbc3f4cf6d6a
MD5 e36d60ff122fedca63854c3cf30c99d1
BLAKE2b-256 51aeff104f009b819e9a41adc44a730a0182cd92496661238769a7234c9c589e

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e57c2bdffe54e70c2ae59f1377af6b75149975cf4c6dbd80f65e6f187e390f7a
MD5 61307e79c27a61c3d5d363113cb971cf
BLAKE2b-256 1c03567b64b9bf3359376c51a8d294eec239c6747f55a0b301c89dfae9be5689

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ec2eb5ac47de1fb0b3b5348598ffe63f4d364532b84bc1a1b7a5c4e71da3a736
MD5 2a8182c0d25c450f02ce2933937bb200
BLAKE2b-256 994ce56d7b1c764df00be7a7c7b1e42f9bd5f287b7b617551b8bc33ed3ebc309

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6a1aab6ad4d6ffff40ca4a9045f04608cfebf465a4348751d9b42e6da54d9a4d
MD5 0c0a7aa23170a9b2f690b35a76900b11
BLAKE2b-256 4d3400a11eef2310e9e7bf59324964d9c9aafbfe12bf6bbd71bdf83ccbdc622d

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9c5735cfb29379ef45d3a4d078db969d3707133afdbde12fed2015b8888bb533
MD5 b600d35be11243970723106cb2c3ab56
BLAKE2b-256 90ef0e943b4d9813f4c688ce1b3d98b0076e8ca5433feb24b76e19621a08dfc8

See more details on using hashes here.

File details

Details for the file uvpec-1.0.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for uvpec-1.0.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 478fe1af226e7a2a322b765e3283d25306a489f983463e0b66322b1d12de793e
MD5 d6fa1216505f43b42e887d0032000fb4
BLAKE2b-256 601e710424764620c90461e166e80fd5be5a3007b4598b17fe532cae433d11c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page