A module for climate data correlation analysis
Project description
WeClim Correlations Module
The ClimateDataAnalysis module is a comprehensive tool designed for analyzing climate data. It supports loading, preprocessing, aggregating, and analyzing climate datasets, making it easier to identify correlations within climate variables.
Features
- Flexible Data Loading: Load climate datasets specified by the user.
- Preprocessing: Ensure datasets have a 'time' dimension and minimal missing data.
- Data Aggregation: Aggregate data over specified time frequencies.
- Correlation Analysis: Generate and visualize correlation matrices.
- Insight Extraction: Identify the highest and lowest correlations among variables.
Installation
To install the ClimateDataAnalysis module:
git clone https://github.com/shiv3679/weclimb_modules.git
cd weclimb_modules/correlation_module
pip install .
Ensure you have pip
and git
installed in your environment.
Quick Start
The following code snippet demonstrates how to use this module.
from weclimb_correlation_module import ClimateDataAnalysis
# Define datasets information
datasets_info = [
{'path': 'path/to/your/dataset1.nc', 'variables': ['var1', 'var2'], 'levels': [100, 500, 850]},
{'path': 'path/to/your/dataset2.nc', 'variables': ['var1', 'var2'], 'levels': None},
# Add additional datasets as needed
]
analysis = ClimateDataAnalysis(datasets_info)
analysis.load_and_process_datasets() # Load and preprocess datasets
analysis.aggregate_over_time(freq='A') # Aggregate data annually
analysis.create_dataframe_from_aggregated_data() # Create DataFrame for analysis
analysis.plot_correlation_matrix() # Visualize correlation matrix
extreme_correlations = analysis.get_extreme_correlations() # Extract extreme correlations
# Optionally, unload datasets to free up memory
analysis.unload_datasets()
Method Overview
-
load_and_process_datasets()
: Loads datasets based on datasets_info provided during initialization. Each dataset is checked to ensure it has a 'time' dimension and does not exceed the allowed threshold for missing data. -
pre_process(dataset)
: Checks if the provided dataset has a 'time' dimension and minimal missing data. This method ensures that only valid datasets are processed further. -
aggregate_over_time(freq)
: Aggregates data over time for all specified variables and levels in all loaded datasets. The frequency of aggregation (freq) can be specified as 'M' for monthly, 'A' for annual, etc. -
create_dataframe_from_aggregated_data()
: Converts the aggregated data into a pandas DataFrame. This DataFrame is then used for further analysis, such as correlation analysis. -
plot_correlation_matrix()
: Generates and plots a correlation matrix using seaborn. This visualization helps in identifying potential relationships between different climate variables. -
get_correlation_matrix()
: Retrieves the correlation matrix of the aggregated data. This method is useful for programmatically accessing correlation values. -
get_extreme_correlations()
: Identifies and returns the highest and lowest correlation pairs from the correlation matrix. This method helps in pinpointing significant correlations that warrant further investigation. -
unload_datasets()
: Clears loaded datasets from memory. This method is useful for freeing up resources after analysis is complete.
Contributing
We welcome contributions to the Correlations module. Feel free to fork the repository, make improvements, and submit pull requests.
License
This project is licensed under the GPL-3.0 License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file weclimb_correlation_module-0.1.0.tar.gz
.
File metadata
- Download URL: weclimb_correlation_module-0.1.0.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 471a1b1ef0d244acce1c22a1573dfb3f54b52b92cf2b8ecce9b98ea6502222d4 |
|
MD5 | 5d77de89c5c5d79dbccd6870ae0d5c47 |
|
BLAKE2b-256 | ecdb88f07bfaff7b196db433833c4eca753b209246e89be476023d76a62aebad |
File details
Details for the file weclimb_correlation_module-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: weclimb_correlation_module-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d247d179f7a0030f3492d880557fd84484a8b8527353524870368ee12514651 |
|
MD5 | 3a19f245d1aafe3d70cba07464cde325 |
|
BLAKE2b-256 | e39f0d8cf421fa83632fa88d96817acb7501a4236d88b4fe11cf835e9c835b09 |