Skip to main content

This package provides a command line tool moles_esgf_tag to generate dataset tags for both MOLES and ESGF.

Project description

CCI Tagger

Static Badge Current Git Release PyPI version

CEDA Dependencies

Static Badge

Overview

This package provides a command line tool moles_esgf_tag to generate dataset tags for both MOLES and ESGF.

Installation

Create a Python virtual environment: Must be Python 3

python -m venv venv
source venv/bin/activate

Install the latest version of the library

git clone https://github.com/cedadev/cci-tag-scanner
cd cci-tag-scanner
pip install -e .

NOTE: As of 22nd Jan 2025 the cci-tag-scanner repository has been upgraded for use with Poetry version 2. This requires the use of an additional requirements_fix.txt patch while a solution for poetry dependencies in github is worked on. The above installation MUST be supplemented with:

pip install -r requirements_fix.txt

This is a temporary fix and will be removed when poetry is patched.

Command Line Script

This script is to be used to check what the tagger outputs when fed with the JSON files. This can be used to build the JSON files and check they are producing what you expect. This script also produces a moles_tags files to attach to this dataset.

Usage

moles_esgf_tag [-h] (-d DATASET | -f FILE | -j JSON_FILE) [--file_count FILE_COUNT] [-v]

You can tag an individual dataset, or tag all the datasets listed in a file. By default a check sum will be produces for each file.

Arguments:

-h, --help            show help message and exit

-d DATASET, --dataset DATASET
                      the full path to the dataset that is to be tagged. This option is used to tag a single
                      dataset.

-f FILE, --file FILE  the name of the file containing a list of datasets to process. This option is used for
                      tagging one or more datasets.

-j, --json_file       Use the JSON file to provide a list of datasets and also provide the mappings
                      which are used by the tagging code. Useful to test datsets and specific mapping files.

--file_count FILE_COUNT
                      how many .nc files to look at per dataset

-v, --verbose         increase output verbosity. Add more vs to increase verbosity.

Output

A number of files are produced as output:

  • esgf_drs.json contains a list of DRS and associated files. Will also list all files which could not generate a DRS
  • moles_tags.csv contains a list of dataset paths and vocabulary URLs
  • error.log contains a log of errors. This is appended to on each run so if you want a clean start, you will need to delete the file.

Examples

moles_esgf_tag -d /neodc/esacci/cloud/data/L3C/avhrr_noaa-16 -v
moles_esgf_tag -f datapath --file_count 2 -v

Check tags

This code generates a directory with HTML pages which can be used to interrogate the opensearch elasticsearch indices to check that the tags which you are expecting are being found. It also highlights files without DRSs

Usage

cci_check_tags [--conf CONF] [--output OUTPUT]

Arguments:

    --conf          Specify the configuration file. Defaults to use %(default)s' 
                    DEFAULT: cci_tagger/conf/tag_check.conf
                    
    --output        Directory to place the output files.
                    DEFAULT: html 

Output

  • index.html The main page listing all ECVs in the index
  • ecv/<ecv_name>.html ECV specific page. Lists all MOLES datasets in the index and displays details about them.

Breaking Changes

V2.0.0

  • Removed default terms file
  • Removed DRS version based on date

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cci_tag_scanner-2.6.0.tar.gz (66.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cci_tag_scanner-2.6.0-py3-none-any.whl (66.6 MB view details)

Uploaded Python 3

File details

Details for the file cci_tag_scanner-2.6.0.tar.gz.

File metadata

  • Download URL: cci_tag_scanner-2.6.0.tar.gz
  • Upload date:
  • Size: 66.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.2 Linux/5.14.0-570.28.1.el9_6.x86_64

File hashes

Hashes for cci_tag_scanner-2.6.0.tar.gz
Algorithm Hash digest
SHA256 4eefd68353931c6d5f2fc4df3aeeff6352ee1d19e586ddd0a67e7379a522a83a
MD5 9ae5eff52008740f807ff6eb4fd13f07
BLAKE2b-256 56e983c6568c4766d9a7f939d50376b05e36d64086f2c488cc2f0add61b06450

See more details on using hashes here.

File details

Details for the file cci_tag_scanner-2.6.0-py3-none-any.whl.

File metadata

  • Download URL: cci_tag_scanner-2.6.0-py3-none-any.whl
  • Upload date:
  • Size: 66.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.2 Linux/5.14.0-570.28.1.el9_6.x86_64

File hashes

Hashes for cci_tag_scanner-2.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5584796dcb75b2a4d5830088e1343e7ee6b434efc52a4cbf75aadb62fdbe7954
MD5 bfd984d42cb2170dfd2502972e8ee36b
BLAKE2b-256 2e10e3dfa74a3a7fd1619044c5ea7578c3431db29e23912ae97dc9a8d9d01248

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page