Skip to main content

A custom CMCC library to list and download data from the Marine Data Store

Project description

Marine Data Store ToolBox

This Python script provides a command-line interface (CLI) for downloading datasets using copernicusmarine toolbox or botos3

boto3 copernicusmarine Ruff


How to Install it

Create the conda environment:

mamba env create -f environment.yml
mamba activate mdsenv

pip install .

Uninstall

To uninstall it:

mamba activate mdsenv

pip uninstall mds-toolbox

Usage

The script provides several commands for different download operations:

Usage: mds [OPTIONS] COMMAND [ARGS]...

Options:
  -h, --help  Show this message and exit.

Commands:
  etag       Get the etag of a give S3 file
  file-list  Wrapper to copernicus marine toolbox file list
  get        Wrapper to copernicusmarine get
  s3-get     Download files with direct access to MDS using S3
  s3-list    Listing file on MDS using S3
  subset     Wrapper to copernicusmarine subset

S3 direct access

Since the copernicusmarine tool add a heavy overhead to s3 request, two functions has been developed to:

  • make very fast s3 request
  • provide a thread-safe access to s3 client

s3-get

Usage: mds s3-get [OPTIONS]

Options:
  -b, --bucket TEXT            Bucket name  [required]
  -f, --filter TEXT            Filter on the online files  [required]
  -o, --output-directory TEXT  Output directory  [required]
  -p, --product TEXT           The product name  [required]
  -i, --dataset-id TEXT        Dataset Id  [required]
  -g, --dataset-version TEXT   Dataset version or tag
  -r, --recursive              List recursive all s3 files
  --threads INTEGER            Downloading file using threads
  -s, --subdir TEXT            Dataset directory on mds (i.e. {year}/{month})
                               - If present boost the connection
  --overwrite                  Force overwrite of the file
  --keep-timestamps            After the download, set the correct timestamp
                               to the file
  --sync-time                  Update the file if it changes on the server
                               using last update information
  --sync-etag                  Update the file if it changes on the server
                               using etag information
  --help                       Show this message and exit.

Example

mds s3-get -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "/work/antonio/20240320" -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d")

Example using threads

mds s3-get --threads 10 -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "." -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d")

s3-list

Usage: mds.py s3-list [OPTIONS]

Options:
  -b, --bucket TEXT           Filter on the online files  [required]
  -f, --filter TEXT           Filter on the online files  [required]
  -p, --product TEXT          The product name  [required]
  -i, --dataset-id TEXT       Dataset Id
  -g, --dataset-version TEXT  Dataset version or tag
  -s, --subdir TEXT           Dataset directory on mds (i.e. {year}/{month}) -
                              If present boost the connection
  -r, --recursive             List recursive all s3 files
  --help                      Show this message and exit.

Example

mds s3-list -b mdl-native-01 -p INSITU_GLO_PHYBGCWAV_DISCRETE_MYNRT_013_030 -i cmems_obs-ins_glo_phybgcwav_mynrt_na_irr -g 202311 -s "monthly/BO/202401" -f "*" | tr " " "\n"

Example recursive

mds s3-list -b mdl-native-12 -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -f '*' -r | tr " " "\n"

Wrapper for copernicusmarine

The following functions rely on copernicusmarine implementation, the final result is strictly related to the installed version

Subset

Usage: mds.py subset [OPTIONS]

Options:
  -o, --output-directory TEXT    Output directory  [required]
  -f, --output-filename TEXT     Output filename  [required]
  -i, --dataset-id TEXT          Dataset Id  [required]
  -v, --variables TEXT           Variables to download. Can be used multiple times
  -x, --minimum-longitude FLOAT  Minimum longitude for the subset.
  -X, --maximum-longitude FLOAT  Maximum longitude for the subset.
  -y, --minimum-latitude FLOAT   Minimum latitude for the subset. Requires a
                                 float within this range:  [-90<=x<=90]
  -Y, --maximum-latitude FLOAT   Maximum latitude for the subset. Requires a
                                 float within this range:  [-90<=x<=90]
  -z, --minimum-depth FLOAT      Minimum depth for the subset. Requires a
                                 float within this range:  [x>=0]
  -Z, --maximum-depth FLOAT      Maximum depth for the subset. Requires a
                                 float within this range:  [x>=0]
  -t, --start-datetime TEXT      Start datetime as:
                                 %Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d
                                 %H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ
  -T, --end-datetime TEXT        End datetime as:
                                 %Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d
                                 %H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ
  -r, --dry-run                  Dry run
  -g, --dataset-version TEXT     Dataset version or tag
  -n, --username TEXT            Username
  -w, --password TEXT            Password
  --help                         Show this message and exit.

Example

mds subset -f output.nc -o . -i cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m -x -18.16667 -X 1.0 -y 30.16 -Y 46.0 -z 0.493 -Z 5727.918000000001 -t 2025-01-01 -T 2025-01-01 -v thetao 

Get

Command:

Usage: mds.py get [OPTIONS]

Options:
  -f, --filter TEXT            Filter on the online files
  -o, --output-directory TEXT  Output directory  [required]
  -i, --dataset-id TEXT        Dataset Id  [required]
  -g, --dataset-version TEXT   Dataset version or tag
  -s, --service TEXT           Force download through one of the available
                               services using the service name among
                               ['original-files', 'ftp'] or its short name
                               among ['files', 'ftp'].
  -d, --dry-run                Dry run
  -u, --update                 If the file not exists, download it, otherwise
                               update it it changed on mds
  -v, --dataset-version TEXT   Dry run
  -nd, --no-directories TEXT   Option to not recreate folder hierarchy in
                               output directory
  --disable-progress-bar TEXT  Flag to hide progress bar
  -n, --username TEXT          Username
  -w, --password TEXT          Password
  --help                       Show this message and exi

Example

mds get -f '20250210*_d-CMCC--TEMP-MFSeas9-MEDATL-b20250225_an-sv10.00.nc' -o . -i cmems_mod_med_phy-tem_anfc_4.2km_P1D-m

File List

To retrieve a list of file, use:

Usage: mds.py file-list [OPTIONS] DATASET_ID MDS_FILTER

Options:
  -g, --dataset-version TEXT  Dataset version or tag
  --help                      Show this message and exit.

Example

mds file-list cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i *b20250225* -g 202411

Etag

Usage: mds.py etag [OPTIONS]

Options:
  -e, --s3_file TEXT     Path to a specific s3 file - if present, other
                         parameters are ignored.
  -p, --product TEXT     The product name
  -d, --dataset_id TEXT  The datasetID
  -v, --version TEXT     Force the selection of a specific dataset version
  -s, --subdir TEXT      Subdir structure on mds (i.e. {year}/{month})
  -f, --mds_filter TEXT  Pattern to filter data (no regex)
  --help                 Show this message and exit.

Example

With a specific file:

mds etag -e s3://mdl-native-12/native/MEDSEA_ANALYSISFORECAST_PHY_006_013/cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i_202411/2025/05/20250501_qm-CMCC--RFVL-MFSeas9-MEDATL-b20250513_an-sv10.00.nc

Or:

mds etag -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -i cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i -g 202411 -f '*' -s 2025/05

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mds_toolbox-2.0.1.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mds_toolbox-2.0.1-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file mds_toolbox-2.0.1.tar.gz.

File metadata

  • Download URL: mds_toolbox-2.0.1.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.11.0-26-generic

File hashes

Hashes for mds_toolbox-2.0.1.tar.gz
Algorithm Hash digest
SHA256 1c5a681b1322d2cbde42f8b5c4628d6869d9d63e8a29b10a00a4a2070256d059
MD5 1c8ca449ec0fc245122aba29cef5598c
BLAKE2b-256 e0c64e0874c104695c87b9fef0fb1f7d001284241b34049518628323ca149b73

See more details on using hashes here.

File details

Details for the file mds_toolbox-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: mds_toolbox-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.11.0-26-generic

File hashes

Hashes for mds_toolbox-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4424ba8063e256540a5c348d79db27d182470c30ea6f0f8948d4a478869b9984
MD5 c8c9855307489c0bf094bc912856dca8
BLAKE2b-256 3ff4e95f028a2ddcacdbdb411f9727ae6c69f7670df462ccc65efd4162600de3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page