Skip to main content

Utility scripts and tools for tsdat.

Project description

Tsdat Tools

This repository contains helpful scripts and notes for several tsdat-related tools.

Some tools are available as jupyter notebooks, and others are available as a command-line utility.

To get access to the command-line utilities, just run:

pip install tsdat-tools

To use all the other tools, we recommend cloning this repository.

Data to Yaml

The goal of this tool is to reduce the tediousness of writing tsdat configuration files for data that you can already read and convert into an xr.Dataset object in tsdat. It generates two output files: dataset.yaml and retriever.yaml, which are used by tsdat to define metadata and how the input variables should be mapped to output variables.

If your file is in one of the following formats, this tool can already do this for you. Formats supported out-of-box:

  • netCDF: Files ending with .nc or .cdf will use the tsdat.NetCDFReader class
  • csv: Files ending with .csv will use the tsdat.CSVReader class
  • parquet: Files ending with .parquet or .pq or .pqt will use the tsdat.ParquetReader class
  • zarr: Files/folders ending with .zarr will use the tsdat.ZarrReader class

Usage

Then you can run the tool with:

tsdat-tools data2yaml path/to/data/file --input-config path/to/current/dataset.yaml

Full usage instructions can be obtained using the --help flag:

>>> tsdat-tools data2yaml --help

Usage: tsdat-tools data2yaml [OPTIONS] DATAPATH

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    datapath   PATH  Path to the input data file that should be used to generate tsdat configurations. │
│                       [default: None]                                                                   │
│                       [required]                                                                        │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ --outdir                               DIRECTORY                      The path to the directory where   │
│                                                                       the 'dataset.yaml' and            │
│                                                                       'retriever.yaml' files should be  │
│                                                                       written.                          │
│                                                                       [default: .]                      │
│ --input-config                         PATH                           Path to a dataset.yaml file to be │
│                                                                       used in addition to               │
│                                                                       configurations derived from the   │
│                                                                       input data file. Configurations   │
│                                                                       defined here take priority over   │
│                                                                       auto-detected properties in the   │
│                                                                       input file.                       │
│                                                                       [default: None]                   │
│ --help                                                                Show this message and exit.       │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯

This tool is designed to be run in the following workflow:

  1. Generate new ingest/pipeline from cookiecutter template (e.g., make cookies command)
  2. Put an example data file for your pipeline in the test/data/input folder
  3. Clean up the autogenerated dataset.yaml file.
    • Add metadata and remove any unused variables
    • Don't add additional variables yet; just make sure that the info in the current file is accurate
  4. Commit your changes in git or back up your changes so you can compare before & after the script runs.
  5. Run this script, passing it the path to your input data file and using the --input-config option to tell it where your cleaned dataset.yaml file is. By default this will generate a new dataset.yaml file in the current working directory (location of pwd on the command line), but you can also use the --outdir option to specify the path where it should write to.
  6. Review the changes the script made to each file. Note that it is not capable of standardizing units or other metadata, so you will still need to clean those up manually.
  7. Continue with the rest of the ingest/pipeline development steps

Excel to Yaml

Please consult the documentation in the excel2yaml/README.md file for more information about this tool.

NetCDF to CSV

Please consult the documentation in the netcdf2csv/README.md file for more information about this tool.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsdat_tools-0.3.1.tar.gz (50.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tsdat_tools-0.3.1-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file tsdat_tools-0.3.1.tar.gz.

File metadata

  • Download URL: tsdat_tools-0.3.1.tar.gz
  • Upload date:
  • Size: 50.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for tsdat_tools-0.3.1.tar.gz
Algorithm Hash digest
SHA256 80a28d9f565024cdce2d6f676a06ee305cf11ccf71793e8179dff5b3b96657fd
MD5 2f64f162a16d4931d5ce3acdfdbd03eb
BLAKE2b-256 efea185256702507896aed715dcd29f8171a1534532fc6060e9f714a8e0ec951

See more details on using hashes here.

File details

Details for the file tsdat_tools-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: tsdat_tools-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for tsdat_tools-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f5a01575982a9b713e9693c0ae8018063c344e7b1e77f0483c66032fe0712197
MD5 fa97b24c12a4a99464724bd3bd461b10
BLAKE2b-256 cc9c8b2e9b576981fcfe9b457e8a7bd37d000e20dd8b88ddb564c49d5e62b135

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page