Skip to main content

A custom CMCC library to list and download data from the Marine Data Store

Project description

Marine Data Store ToolBox

This Python script provides a command-line interface (CLI) for downloading datasets using copernicusmarine toolbox or botos3

boto3 copernicusmarine Ruff


How to Install it

Create the conda environment:

mamba env create -f environment.yml
mamba activate mdsenv

pip install .

Uninstall

To uninstall it:

mamba activate mdsenv

pip uninstall mds-toolbox

Usage

The script provides several commands for different download operations:

Usage: mds [OPTIONS] COMMAND [ARGS]...

Options:
  -h, --help  Show this message and exit.

Commands:
  etag       Get the etag of a give S3 file
  file-list  Wrapper to copernicus marine toolbox file list
  get        Wrapper to copernicusmarine get
  s3-get     Download files with direct access to MDS using S3
  s3-list    Listing file on MDS using S3
  subset     Wrapper to copernicusmarine subset

S3 direct access

Since the copernicusmarine tool add a heavy overhead to s3 request, two functions has been developed to:

  • make very fast s3 request
  • provide a thread-safe access to s3 client

s3-get

Usage: mds s3-get [OPTIONS]

Options:
  -b, --bucket TEXT            Bucket name  [required]
  -f, --filter TEXT            Filter on the online files  [required]
  -o, --output-directory TEXT  Output directory  [required]
  -p, --product TEXT           The product name  [required]
  -i, --dataset-id TEXT        Dataset Id  [required]
  -g, --dataset-version TEXT   Dataset version or tag
  -r, --recursive              List recursive all s3 files
  --threads INTEGER            Downloading file using threads
  -s, --subdir TEXT            Dataset directory on mds (i.e. {year}/{month})
                               - If present boost the connection
  --overwrite                  Force overwrite of the file
  --keep-timestamps            After the download, set the correct timestamp
                               to the file
  --sync-time                  Update the file if it changes on the server
                               using last update information
  --sync-etag                  Update the file if it changes on the server
                               using etag information
  --help                       Show this message and exit.

Example

mds s3-get -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "/work/antonio/20240320" -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d")

Example using threads

mds s3-get --threads 10 -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "." -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d")

s3-list

Usage: mds.py s3-list [OPTIONS]

Options:
  -b, --bucket TEXT           Filter on the online files  [required]
  -f, --filter TEXT           Filter on the online files  [required]
  -p, --product TEXT          The product name  [required]
  -i, --dataset-id TEXT       Dataset Id
  -g, --dataset-version TEXT  Dataset version or tag
  -s, --subdir TEXT           Dataset directory on mds (i.e. {year}/{month}) -
                              If present boost the connection
  -r, --recursive             List recursive all s3 files
  --help                      Show this message and exit.

Example

mds s3-list -b mdl-native-01 -p INSITU_GLO_PHYBGCWAV_DISCRETE_MYNRT_013_030 -i cmems_obs-ins_glo_phybgcwav_mynrt_na_irr -g 202311 -s "monthly/BO/202401" -f "*" | tr " " "\n"

Example recursive

mds s3-list -b mdl-native-12 -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -f '*' -r | tr " " "\n"

Wrapper for copernicusmarine

The following functions rely on copernicusmarine implementation, the final result is strictly related to the installed version

Subset

Usage: mds.py subset [OPTIONS]

Options:
  -o, --output-directory TEXT    Output directory  [required]
  -f, --output-filename TEXT     Output filename  [required]
  -i, --dataset-id TEXT          Dataset Id  [required]
  -v, --variables TEXT           Variables to download. Can be used multiple times
  -x, --minimum-longitude FLOAT  Minimum longitude for the subset.
  -X, --maximum-longitude FLOAT  Maximum longitude for the subset.
  -y, --minimum-latitude FLOAT   Minimum latitude for the subset. Requires a
                                 float within this range:  [-90<=x<=90]
  -Y, --maximum-latitude FLOAT   Maximum latitude for the subset. Requires a
                                 float within this range:  [-90<=x<=90]
  -z, --minimum-depth FLOAT      Minimum depth for the subset. Requires a
                                 float within this range:  [x>=0]
  -Z, --maximum-depth FLOAT      Maximum depth for the subset. Requires a
                                 float within this range:  [x>=0]
  -t, --start-datetime TEXT      Start datetime as:
                                 %Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d
                                 %H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ
  -T, --end-datetime TEXT        End datetime as:
                                 %Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d
                                 %H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ
  -r, --dry-run                  Dry run
  -g, --dataset-version TEXT     Dataset version or tag
  -n, --username TEXT            Username
  -w, --password TEXT            Password
  --help                         Show this message and exit.

Example

mds subset -f output.nc -o . -i cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m -x -18.16667 -X 1.0 -y 30.16 -Y 46.0 -z 0.493 -Z 5727.918000000001 -t 2025-01-01 -T 2025-01-01 -v thetao 

Get

Command:

Usage: mds.py get [OPTIONS]

Options:
  -f, --filter TEXT            Filter on the online files
  -o, --output-directory TEXT  Output directory  [required]
  -i, --dataset-id TEXT        Dataset Id  [required]
  -g, --dataset-version TEXT   Dataset version or tag
  -s, --service TEXT           Force download through one of the available
                               services using the service name among
                               ['original-files', 'ftp'] or its short name
                               among ['files', 'ftp'].
  -d, --dry-run                Dry run
  -u, --update                 If the file not exists, download it, otherwise
                               update it it changed on mds
  -v, --dataset-version TEXT   Dry run
  -nd, --no-directories TEXT   Option to not recreate folder hierarchy in
                               output directory
  --disable-progress-bar TEXT  Flag to hide progress bar
  -n, --username TEXT          Username
  -w, --password TEXT          Password
  --help                       Show this message and exi

Example

mds get -f '20250210*_d-CMCC--TEMP-MFSeas9-MEDATL-b20250225_an-sv10.00.nc' -o . -i cmems_mod_med_phy-tem_anfc_4.2km_P1D-m

File List

To retrieve a list of file, use:

Usage: mds.py file-list [OPTIONS] DATASET_ID MDS_FILTER

Options:
  -g, --dataset-version TEXT  Dataset version or tag
  --help                      Show this message and exit.

Example

mds file-list cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i *b20250225* -g 202411

Etag

Usage: mds.py etag [OPTIONS]

Options:
  -e, --s3_file TEXT     Path to a specific s3 file - if present, other
                         parameters are ignored.
  -p, --product TEXT     The product name
  -d, --dataset_id TEXT  The datasetID
  -v, --version TEXT     Force the selection of a specific dataset version
  -s, --subdir TEXT      Subdir structure on mds (i.e. {year}/{month})
  -f, --mds_filter TEXT  Pattern to filter data (no regex)
  --help                 Show this message and exit.

Example

With a specific file:

mds etag -e s3://mdl-native-12/native/MEDSEA_ANALYSISFORECAST_PHY_006_013/cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i_202411/2025/05/20250501_qm-CMCC--RFVL-MFSeas9-MEDATL-b20250513_an-sv10.00.nc

Or:

mds etag -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -i cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i -g 202411 -f '*' -s 2025/05

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mds_toolbox-2.1.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mds_toolbox-2.1-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file mds_toolbox-2.1.tar.gz.

File metadata

  • Download URL: mds_toolbox-2.1.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.14.0-27-generic

File hashes

Hashes for mds_toolbox-2.1.tar.gz
Algorithm Hash digest
SHA256 65a5d4073af30a218e66f55044e39c4cfae6c43584a29c614ac8edcb066d840f
MD5 14a0ea0b2479d0baf0e791c446a689d0
BLAKE2b-256 921110647a21ba663b236baa0e52b8dc93da187f6b4fffe9b28848289db3af00

See more details on using hashes here.

File details

Details for the file mds_toolbox-2.1-py3-none-any.whl.

File metadata

  • Download URL: mds_toolbox-2.1-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.14.0-27-generic

File hashes

Hashes for mds_toolbox-2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3a8ef377198a82e5e8c7cd09b98a5c72d98720e185ea2715d09e57e84ce3e9de
MD5 8b3ba47aa350c2be26e922123348a7a2
BLAKE2b-256 b0036e9ecc4f285b4e7d200a727d186e1d30354c1fbad06b1447b65609ccc2a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page