Skip to main content

A module for climate data correlation analysis

Project description

WeClim Correlations Module

The ClimateDataAnalysis module is a comprehensive tool designed for analyzing climate data. It supports loading, preprocessing, aggregating, and analyzing climate datasets, making it easier to identify correlations within climate variables.

Features

  • Flexible Data Loading: Load climate datasets specified by the user.
  • Preprocessing: Ensure datasets have a 'time' dimension and minimal missing data.
  • Data Aggregation: Aggregate data over specified time frequencies.
  • Correlation Analysis: Generate and visualize correlation matrices.
  • Insight Extraction: Identify the highest and lowest correlations among variables.

Installation

To install the ClimateDataAnalysis module:

git clone https://github.com/shiv3679/weclimb_modules.git
cd weclimb_modules/correlation_module
pip install .

Ensure you have pip and git installed in your environment.

Quick Start

The following code snippet demonstrates how to use this module.

from weclimb_correlation_module import ClimateDataAnalysis

# Define datasets information
datasets_info = [
    {'path': 'path/to/your/dataset1.nc', 'variables': ['var1', 'var2'], 'levels': [100, 500, 850]},
    {'path': 'path/to/your/dataset2.nc', 'variables': ['var1', 'var2'], 'levels': None},
    # Add additional datasets as needed
]

analysis = ClimateDataAnalysis(datasets_info)
analysis.load_and_process_datasets()  # Load and preprocess datasets
analysis.aggregate_over_time(freq='A')  # Aggregate data annually
analysis.create_dataframe_from_aggregated_data()  # Create DataFrame for analysis
analysis.plot_correlation_matrix()  # Visualize correlation matrix
extreme_correlations = analysis.get_extreme_correlations()  # Extract extreme correlations


# Optionally, unload datasets to free up memory
analysis.unload_datasets()

Method Overview

  • load_and_process_datasets() : Loads datasets based on datasets_info provided during initialization. Each dataset is checked to ensure it has a 'time' dimension and does not exceed the allowed threshold for missing data.

  • pre_process(dataset) : Checks if the provided dataset has a 'time' dimension and minimal missing data. This method ensures that only valid datasets are processed further.

  • aggregate_over_time(freq) : Aggregates data over time for all specified variables and levels in all loaded datasets. The frequency of aggregation (freq) can be specified as 'M' for monthly, 'A' for annual, etc.

  • create_dataframe_from_aggregated_data() : Converts the aggregated data into a pandas DataFrame. This DataFrame is then used for further analysis, such as correlation analysis.

  • plot_correlation_matrix() : Generates and plots a correlation matrix using seaborn. This visualization helps in identifying potential relationships between different climate variables.

  • get_correlation_matrix() : Retrieves the correlation matrix of the aggregated data. This method is useful for programmatically accessing correlation values.

  • get_extreme_correlations() : Identifies and returns the highest and lowest correlation pairs from the correlation matrix. This method helps in pinpointing significant correlations that warrant further investigation.

  • unload_datasets() : Clears loaded datasets from memory. This method is useful for freeing up resources after analysis is complete.

Contributing

We welcome contributions to the Correlations module. Feel free to fork the repository, make improvements, and submit pull requests.

License

This project is licensed under the GPL-3.0 License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weclimb_correlation_module-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file weclimb_correlation_module-0.1.0.tar.gz.

File metadata

File hashes

Hashes for weclimb_correlation_module-0.1.0.tar.gz
Algorithm Hash digest
SHA256 471a1b1ef0d244acce1c22a1573dfb3f54b52b92cf2b8ecce9b98ea6502222d4
MD5 5d77de89c5c5d79dbccd6870ae0d5c47
BLAKE2b-256 ecdb88f07bfaff7b196db433833c4eca753b209246e89be476023d76a62aebad

See more details on using hashes here.

File details

Details for the file weclimb_correlation_module-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for weclimb_correlation_module-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d247d179f7a0030f3492d880557fd84484a8b8527353524870368ee12514651
MD5 3a19f245d1aafe3d70cba07464cde325
BLAKE2b-256 e39f0d8cf421fa83632fa88d96817acb7501a4236d88b4fe11cf835e9c835b09

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page