Skip to main content

A library for developing earth system foundation models

Project description

OlmoEarth Logo

GitHub License Model Checkpoints Paper PDF

The OlmoEarth models are a flexible, multi-modal, spatio-temporal family of foundation models for Earth Observations.

The OlmoEarth models exist as part of the OlmoEarth platform. The OlmoEarth Platform is an end-to-end solution for scalable planetary intelligence, providing everything needed to go from raw data through R&D, to fine-tuning and production deployment.

Installation

We recommend Python 3.12, and recommend using uv. To install dependencies with uv, run:

git clone git@github.com:allenai/olmoearth_pretrain.git
cd olmoearth_pretrain
uv sync --locked --all-groups --python 3.12
# only necessary for development
uv tool install pre-commit --with pre-commit-uv --force-reinstall

uv installs everything into a venv, so to keep using python commands you can activate uv's venv: source .venv/bin/activate. Otherwise, swap to uv run python.

Inference-Only Installation

For inference and model loading without training dependencies:

uv sync --locked

OlmoEarth is built using OLMo-core. OLMo-core's published Docker images contain all core and optional dependencies.

Model Summary

Model Architecture Diagram

The OlmoEarth models are trained on three satellite modalities (Sentinel 2, Sentinel 1 and Landsat) and six derived maps (OpenStreetMap, WorldCover, USDA Cropland Data Layer, SRTM DEM, WRI Canopy Height Map, and WorldCereal).

Model Size Weights Encoder Params Decoder Params
Nano link 1.4M 800K
Tiny link 6.2M 1.9M
Base link 89M 30M
Large link 308M 53M

Using OlmoEarth

InferenceQuickstart shows how to initialize the OlmoEarth model and apply it on a satellite image.

We also have several more in-depth tutorials for computing OlmoEarth embeddings and fine-tuning OlmoEarth on downstream tasks:

Additionally, olmoearth_projects has several examples of active OlmoEarth deployments.

Data Summary

Our pretraining dataset contains 285,288 samples from around the world of 2.56km×2.56km regions, although many samples contain only a subset of the timesteps and modalities.

The distribution of the samples is available below:

Training sample distribution

The dataset can be downloaded here.

Detailed instructions on how to make your own pretraining dataset are available in the dataset README.

Training scripts

Detailed instructions on how to pretrain your own OlmoEarth model are available in Pretraining.md.

Evaluations

Detailed instructions on how to replicate our evaluations is available here:

Running Tests

Tests can be run with different dependency configurations using uv run:

# Full test suite (all dependencies - flash attn including olmo-core)
uv run --all-groups --no-group flash-attn pytest tests/

# Model loading tests with full deps (with olmo-core)
uv run --all-groups --no-group flash-attn pytest tests_minimal_deps/

# Model loading tests with minimal deps only (no olmo-core)
uv run --group dev pytest tests_minimal_deps/

The tests_minimal_deps/ directory contains tests that verify model loading works both with and without olmo-core installed. These run twice in CI to ensure compatibility.

License

This code is licensed under the OlmoEarth Artifact License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olmoearth_pretrain-0.1.0.tar.gz (308.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

olmoearth_pretrain-0.1.0-py3-none-any.whl (404.7 kB view details)

Uploaded Python 3

File details

Details for the file olmoearth_pretrain-0.1.0.tar.gz.

File metadata

  • Download URL: olmoearth_pretrain-0.1.0.tar.gz
  • Upload date:
  • Size: 308.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for olmoearth_pretrain-0.1.0.tar.gz
Algorithm Hash digest
SHA256 95132034c0c6783ccb3404fad17c94f39f0356a0feb2b519b456e6a17017c9e8
MD5 e78bd0097a27754ed888ffe32fe2df13
BLAKE2b-256 1244c6a8a0948d2ff992eb02a670b54d534d80ab178b883715301f9c01fb3268

See more details on using hashes here.

Provenance

The following attestation bundles were made for olmoearth_pretrain-0.1.0.tar.gz:

Publisher: publish.yml on allenai/olmoearth_pretrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file olmoearth_pretrain-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for olmoearth_pretrain-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84316688b85c151ee0bb0e477783cedcc9f4309d70ab5358b48b58a4a97701cf
MD5 d7ef3b53b40d2ea0c4ad1c1b28fa5ab1
BLAKE2b-256 189d3841c37302013bd1839e6617284020a9e89d9b82c159095fdd75ea31506f

See more details on using hashes here.

Provenance

The following attestation bundles were made for olmoearth_pretrain-0.1.0-py3-none-any.whl:

Publisher: publish.yml on allenai/olmoearth_pretrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page