Benchmark dataset for Airborne Tree Machine Learning
Project description
Overview
The MillionTrees benchmark is designed to provide open, reproducible, and rigorous evaluation of tree detection algorithms. This repo is the Python package for rapid data sharing and evaluation. With over 2 million annotations and tens of millions of weak annotations from algorithm derived workflows for pretraining, MillionTrees is the largest tree detection dataset globally.
Current status
We have released a beta version of public data, these are datasets that have previously been published and have a DOI. We will followup this release, likely with a 1.0 tag, of the previously unpublished parts of the dataset along with a scientific manuscript.
📊 Current Dataset Status: See our comprehensive Dataset Release Report for up-to-date information on dataset versions, sizes, and download links.
Dataloaders
There are three data loaders based on annotation geometry. TreeBoxes are bounding boxes for individual tree detection. TreePoints are centroids for tree counting and detection, and TreePolygons are for finer crown segmentation.
Why MillionTrees?
There has been a tremendous number of tree crown detection benchmarks, but a lack of progress towards a single algorithm that can be used globally across aquisition sensors, forest type and annotation geometry. Our view is that the hundreds of tree detection algorithms for RGB data published in the last 10 years are all data starved. There are many good models, but they can only be so useful with the small datasets any research team can collect. The result is years of effort in model development, but ultimately a lacking solution for a large audience. The MillionTrees dataset seeks to collect a million annotations across point, polygon and box geometries at a global scale.
The MillionTrees dataset represents where we are as a community. Many datasets are incompletely annotated, and there is varying degrees of annotation accuracy. This is by design, we aim to reflect the real, not idealized, status of tree detection algorithms and applications. By incluing these data that are normally excluded from benchmarks we can both dramatically increase the diversity of tree presentations and backgrounds, as well as engage the community in solving common computer vision challenges for applied machine learning.
Installation
pip install milliontrees
Dev Requirements
To build from the GitHub source and install the required dependencies, follow these instructions:
-
Clone the GitHub repository:
git clone https://github.com/weecology/MillionTrees.git -
Change to the repository directory:
cd MillionTrees -
(Recommended) Create and activate a virtual environment, then install dev extras:
python -m venv .venv && source .venv/bin/activate pip install -e .[dev,docs] -
(Optional) Build distributions:
python -m build
Once the installation is complete, you can use the MillionTrees package in your Python projects.
Datasets
Datasets are documented on ReadTheDocs with sample images overlayed with annotations. https://milliontrees.idtrees.org/en/latest/datasets.html
Dataset Release Report: For detailed information about dataset versions, download sizes, and availability, see our Dataset Release Report.
Citing MillionTrees
Acknowledgements
The design of the MillionTrees benchmark was inspired by the WILDS benchmark, and we are grateful to their work, as well as Sara Beery for suggesting the use of this template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file milliontrees-0.2.1.tar.gz.
File metadata
- Download URL: milliontrees-0.2.1.tar.gz
- Upload date:
- Size: 64.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
744d1b896a51e68180f6c1a2a4d0552e363ce5f3b016f688f352aab821bd626e
|
|
| MD5 |
d7bc96947962e82af2279c08c78629c3
|
|
| BLAKE2b-256 |
d22cc80ca854190bb96b6585daa89484c29abf4ad8e55f741adde0c668d39d85
|
Provenance
The following attestation bundles were made for milliontrees-0.2.1.tar.gz:
Publisher:
python-publish.yml on weecology/MillionTrees
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
milliontrees-0.2.1.tar.gz -
Subject digest:
744d1b896a51e68180f6c1a2a4d0552e363ce5f3b016f688f352aab821bd626e - Sigstore transparency entry: 590916452
- Sigstore integration time:
-
Permalink:
weecology/MillionTrees@6f50305c606cc21ab90b675dffb68c84451dabbe -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/weecology
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6f50305c606cc21ab90b675dffb68c84451dabbe -
Trigger Event:
release
-
Statement type:
File details
Details for the file milliontrees-0.2.1-py3-none-any.whl.
File metadata
- Download URL: milliontrees-0.2.1-py3-none-any.whl
- Upload date:
- Size: 59.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4169fb3c06862d0324ba37f769f2837f935c9c9682c2f927e7db856573a8df51
|
|
| MD5 |
907866c0ad0c28f6ceb6cf74edb065ca
|
|
| BLAKE2b-256 |
c8f8040fe568caffded8e22b24a5983c866c522d98502088889bde6552dfb331
|
Provenance
The following attestation bundles were made for milliontrees-0.2.1-py3-none-any.whl:
Publisher:
python-publish.yml on weecology/MillionTrees
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
milliontrees-0.2.1-py3-none-any.whl -
Subject digest:
4169fb3c06862d0324ba37f769f2837f935c9c9682c2f927e7db856573a8df51 - Sigstore transparency entry: 590916480
- Sigstore integration time:
-
Permalink:
weecology/MillionTrees@6f50305c606cc21ab90b675dffb68c84451dabbe -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/weecology
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6f50305c606cc21ab90b675dffb68c84451dabbe -
Trigger Event:
release
-
Statement type: