Tools to support interoperability and the adoption of standards for permafrost data files.
Project description
Introduction
The Permafrost File Interoperability Toolkit (PFIT) is designed to promote interoperability and the adoption of standards for permafrost data files. This package currently supports the NTGS ground temperature standard. It includes tools to check and manipulate ground temperature data.
File Data Checker
The FileDataChecker
checks column names and values for CSV, XLS, and XLSX files. It logs issues with the files being read in if they do not conform to the NTGS standard.
The File Data Checker can be run by passing arguments through the command line, but it can also be imported for use as a module.
The following functions are available from the class:
@static pathExists(path: str)
Checks for the existence of a path and raises an exception if it does not exist.
@static createPathIfExists(path: str)
Creates a path if it does not exist and returns the path initially passed in otherwise.
checkPath(pathLob: str, isVerbose: bool, logPath: str)
-
pathLoc - A string of a file path leading to the file to be checked. This can also be a zip file, which will be unzipped.
-
isVerbose - A boolean value that determines if true, verbose logging to the console will also occur.
-
logPath - A string of a file path that can either lead to a directory or a specific file for the log file to be created at.
This parameter can be left as None or an empty string (although something must still be passed into the function).
Sets logging level, creates passed file paths if non-existent, unzips files if in ZIP format and calls checkFile
.
checkFile(fileName: str)
Opens file with pandas and applies the error checks described below.
The following errors may be reported:
- Invalid Time - Time does not follow a valid time in the format HH:MM:SS.
- Invalid Date - Date values should be formatted as YYYY-MM-DD.
- Unexpected Column - One of the first 6 column names is not from the expected list of column names (or is not in the correct order). If this warning occurs, the columns must be resolved in the correct name and order first, otherwise no other checking is done.
- Expects data files to contain the first 6 columns with the exact following names: project_name, site_id, latitude, longitude, date_YYYY-MM-DD, time_HH:MM:SS
- Unexpected Metre - All following metre columns after the first 6 column names should be formatted as "_m" only.
- No Measurements - No measurement columns are detected in the file.
- File Type - The file read in is not supported.
- Coordinate - A latitude or longitude value contains something that is not valid.
- Latitude - A latitude value is found that is not valid (Less than -90 or greater than 90).
- Longitude - A longitude value is found that is not valid (Less than -180 or greater than 180).
- Temperature - A temperature value is found that is not a valid temperature.
XLS and XLSX files are not recommended as they can be problematic when parsing date/time values. Please consider saving data in CSV format.
If you do decide to use XLS(X) files, ensure that the data is located in the first sheet as this is is the only sheet that is checked.
CSV Column Melter
The CSVColMelter
accepts existing ground temperature data files that are in the wide format and converts it to the long CSV format through transposition of depth columns. Files must conform to the NTGS-style ground temperature file format. This can be verified with the FileDataChecker
.
The CSV Column Melter can be run by passing arguments through the command line, but it can also be imported for use as a module.
The following functions are available from the class:
@static timezone_check(tz: str)
Converts the timezone value to a float and checks if it is within reasonable range. Function for the command line argument parsing.
@static pathExists(path: str)
Checks for the existence of a path and raises an exception if it does not exist.
getISOFormat(date: str or datetime.datetime, time: str or datetime.time)
Used in pandas value interpretation. Parses a date string as YYYY-MM-DD or datetime.datetime object and a time string as HH:MM:SS or a datetime.time object, returns a datetime.datetime object in ISO format.
meltFile(filename: str, outLoc: str)
Opens file for melting, outputs to specified output location (outLoc) when dataframe has been melted.
meltDataFrame(df: pandas.DataFrame)
Dataframe of read in file is manipulated from wide to long format.
Conversion to NetCDF
NTGS_to_NetCDF
converts NTGS-style CSV files into NetCDF (.nc
). Currently a work in progress.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pfit-0.2.1.tar.gz
.
File metadata
- Download URL: pfit-0.2.1.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f77837415a9d767abd052d153b2d1e4f5ab31e08da76dc9d545416a12355aae1 |
|
MD5 | e944d5706ce06074407b86a6b1a770fe |
|
BLAKE2b-256 | c48a366c81cb4c1c3a51e4c6baf9c78f7501bca711e2790f9ef868522a9aff0c |