Skip to main content

Benchmark dataset for Airborne Tree Machine Learning

Project description

Github Actions Documentation Status Version PyPI - Downloads

Overview

The MillionTrees benchmark is designed to provide open, reproducible, and rigorous evaluation of tree detection algorithms. This repo is the Python package for rapid data sharing and evaluation.

Current status

We have released a beta version of public data, these are datasets that have previously been published and have a DOI. We will followup this release, likely with a 1.0 tag, of the previously unpublished parts of the dataset along with a scientific manuscript.

📊 Current Dataset Status: See our comprehensive Dataset Release Report for up-to-date information on dataset versions, sizes, and download links.

Dataloaders

There are three data loaders based on annotation geometry. TreeBoxes are bounding boxes for individual tree detection. TreePoints are centroids for tree counting and detection, and TreePolygons are for finer crown segmentation.

Why MillionTrees?

There has been a tremendous number of tree crown detection benchmarks, but a lack of progress towards a single algorithm that can be used globally across aquisition sensors, forest type and annotation geometry. Our view is that the hundreds of tree detection algorithms for RGB data published in the last 10 years are all data starved. There are many good models, but they can only be so useful with the small datasets any research team can collect. The result is years of effort in model development, but ultimately a lacking solution for a large audience. The MillionTrees dataset seeks to collect a million annotations across point, polygon and box geometries at a global scale.

The MillionTrees dataset represents where we are as a community. Many datasets are incompletely annotated, and there is varying degrees of annotation accuracy. This is by design, we aim to reflect the real, not idealized, status of tree detection algorithms and applications. By incluing these data that are normally excluded from benchmarks we can both dramatically increase the diversity of tree presentations and backgrounds, as well as engage the community in solving common computer vision challenges for applied machine learning.

Installation

pip install milliontrees

Hugging Face dataset loading and sharing functionality is included in the main package.

Dev Requirements

To build from the GitHub source and install the required dependencies, follow these instructions:

  1. Clone the GitHub repository:

    git clone https://github.com/weecology/MillionTrees.git
    
  2. Change to the repository directory:

    cd MillionTrees
    
  3. (Recommended) Create and activate a virtual environment, then install dev extras:

    python -m venv .venv && source .venv/bin/activate
    pip install -e .[dev,docs]
    
  4. (Optional) Build distributions:

    python -m build
    

Once the installation is complete, you can use the MillionTrees package in your Python projects.

Datasets

Datasets are documented on ReadTheDocs with sample images overlayed with annotations. https://milliontrees.idtrees.org/en/latest/datasets.html

Mini dataset quick results

Fast sanity-check runs on mini datasets (one image per source):

Model (script) Task Root dir Key metrics
sam3_points.py (SAM3 native, GPU) TreePoints data-mini KeypointAccuracy: 0.000; Counting MAE: 1164.000
sam3_boxes.py (SAM3 native, GPU) TreeBoxes data-mini Detection Acc: 0.083; Recall: 0.084
baseline_points.py (DeepForest) TreePoints /orange/ewhite/web/public/MillionTrees KeypointAccuracy: 0.000; Counting MAE: 104.250
baseline_boxes.py (DeepForest) TreeBoxes /orange/ewhite/web/public/MillionTrees Detection Acc: 0.559; Recall: 0.794

See more in docs/leaderboard.md. To reproduce, use the example commands in that file.

Citing MillionTrees

Acknowledgements

The design of the MillionTrees benchmark was inspired by the WILDS benchmark, and we are grateful to their work, as well as Sara Beery for suggesting the use of this template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

milliontrees-0.3.1.tar.gz (14.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

milliontrees-0.3.1-py3-none-any.whl (55.7 kB view details)

Uploaded Python 3

File details

Details for the file milliontrees-0.3.1.tar.gz.

File metadata

  • Download URL: milliontrees-0.3.1.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for milliontrees-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d19e9a519345f7fdbc8099d959542d0093d54b19966e065274738abc1cefe76f
MD5 caaf261b609da6889e4dee0aceea9275
BLAKE2b-256 181f4ea164bc917f19f39e05eab5889eabd041511cf8722e9138d1d7777e64b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for milliontrees-0.3.1.tar.gz:

Publisher: python-publish.yml on weecology/MillionTrees

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file milliontrees-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: milliontrees-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 55.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for milliontrees-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 df9efce0379dca271ae0f0a2999711544229b348ac6e8bc6a69a87b90d5a54a5
MD5 63c738f7cc3cc46975de48e875f2a1b9
BLAKE2b-256 6d1ac672fd4508d7e1b443e5b2a6009bf441072bb903c00842d64d473546e7f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for milliontrees-0.3.1-py3-none-any.whl:

Publisher: python-publish.yml on weecology/MillionTrees

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page