Utility scripts and tools for tsdat.
Project description
Tsdat Tools
This repository contains helpful scripts and notes for several tsdat-related tools.
Some tools are available as jupyter notebooks, and others are available as a command-line utility.
To get access to the command-line utilities, just run:
pip install tsdat-tools
To use all the other tools, we recommend cloning this repository.
Data to Yaml
The goal of this tool is to reduce the tediousness of writing tsdat configuration files for data that you can already
read and convert into an xr.Dataset object in tsdat. It generates two output files: dataset.yaml and
retriever.yaml, which are used by tsdat to define metadata and how the input variables should be mapped to output
variables.
If your file is in one of the following formats, this tool can already do this for you. Formats supported out-of-box:
netCDF: Files ending with.ncor.cdfwill use thetsdat.NetCDFReaderclasscsv: Files ending with.csvwill use thetsdat.CSVReaderclassparquet: Files ending with.parquetor.pqor.pqtwill use thetsdat.ParquetReaderclasszarr: Files/folders ending with.zarrwill use thetsdat.ZarrReaderclass
Usage
Then you can run the tool with:
tsdat-tools data2yaml path/to/data/file --input-config path/to/current/dataset.yaml
Full usage instructions can be obtained using the --help flag:
>>> tsdat-tools data2yaml --help
Usage: tsdat-tools data2yaml [OPTIONS] DATAPATH
╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ * datapath PATH Path to the input data file that should be used to generate tsdat configurations. │
│ [default: None] │
│ [required] │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ --outdir DIRECTORY The path to the directory where │
│ the 'dataset.yaml' and │
│ 'retriever.yaml' files should be │
│ written. │
│ [default: .] │
│ --input-config PATH Path to a dataset.yaml file to be │
│ used in addition to │
│ configurations derived from the │
│ input data file. Configurations │
│ defined here take priority over │
│ auto-detected properties in the │
│ input file. │
│ [default: None] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
This tool is designed to be run in the following workflow:
- Generate new ingest/pipeline from cookiecutter template (e.g.,
make cookiescommand) - Put an example data file for your pipeline in the
test/data/inputfolder - Clean up the autogenerated
dataset.yamlfile.- Add metadata and remove any unused variables
- Don't add additional variables yet; just make sure that the info in the current file is accurate
- Commit your changes in
gitor back up your changes so you can compare before & after the script runs. - Run this script, passing it the path to your input data file and using the
--input-configoption to tell it where your cleaneddataset.yamlfile is. By default this will generate a newdataset.yamlfile in the current working directory (location ofpwdon the command line), but you can also use the--outdiroption to specify the path where it should write to. - Review the changes the script made to each file. Note that it is not capable of standardizing units or other metadata, so you will still need to clean those up manually.
- Continue with the rest of the ingest/pipeline development steps
Excel to Yaml
Please consult the documentation in the excel2yaml/README.md file for more information about this tool.
NetCDF to CSV
Please consult the documentation in the netcdf2csv/README.md file for more information about this tool.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tsdat_tools-0.3.1.tar.gz.
File metadata
- Download URL: tsdat_tools-0.3.1.tar.gz
- Upload date:
- Size: 50.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80a28d9f565024cdce2d6f676a06ee305cf11ccf71793e8179dff5b3b96657fd
|
|
| MD5 |
2f64f162a16d4931d5ce3acdfdbd03eb
|
|
| BLAKE2b-256 |
efea185256702507896aed715dcd29f8171a1534532fc6060e9f714a8e0ec951
|
File details
Details for the file tsdat_tools-0.3.1-py3-none-any.whl.
File metadata
- Download URL: tsdat_tools-0.3.1-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5a01575982a9b713e9693c0ae8018063c344e7b1e77f0483c66032fe0712197
|
|
| MD5 |
fa97b24c12a4a99464724bd3bd461b10
|
|
| BLAKE2b-256 |
cc9c8b2e9b576981fcfe9b457e8a7bd37d000e20dd8b88ddb564c49d5e62b135
|