Efficient spatial and temporal aggregation of gridded climate data

These details have not been verified by PyPI

Project description

`aggfly`

NOTE: aggfly is still in development and may not be stable for new users. Please proceed with caution.

Overview: Why `aggfly`?

TODO: Brief introduction on the purpose of aggfly.

Installation

Required dependencies

Python (3.11.6 or later)

Instructions

Since aggfly relies on several packages with version restrictions, we recommend installing the package inside a virtual environment, such as conda (see instructions).

Use pip to install package from PyPI:

pip install aggfly

Jupyter

You may want to use aggfly to run batch jobs or in Jupyter sessions. In the case in which you experience issues when accessing the environment in Jupyter, try this to make the environment you have created available in Jupyter. In some of the cases, it will be sufficient to run the following commands:

conda activate <environment_name>
conda install ipykernel
python -m ipykernel install --user --name <environment_name>
conda deactivate

to then be able start the Jupyter session from your terminal.

Input datasets

The three raw inputs to be used to obtain an aggregated dataset containing climatic information at a relatively coarser spatial and temporal level are:

Shapefile: The shapefile containing the information on the boundaries of the target administrative regions. For example, a shapefile with the boundaries of world countries.
Climatic dataset: The raster dataset with the information at the relatively fine level that you want to aggregate at a spatially and/or temprally coarser level. For example, an ERA5 raster data with the hourly average temperature for each 0.25x0.25 degrees grid cell for the whole world.
(Optional) Secondary weights dataset: The raster dataset containing the the information on the variable that you want to use to compute the weights that will be used to compute the weighted average of the climatic data over each administrative region.

Workflow

We will here present the workflow with the main functionalities of the package. For a specific example application refer to the example notebook.

To correctly aggregate the raster data containing climatic information at the grid cell level, you will need to follow three steps:

Loading the shapefile containing the target administrative regions and the raster dataset to compute the area weights
Computing the weights to be used in the aggregation
Transforming and aggregating the climatic data spatially and temporally

Remember to set the project_dir at the start of your code, to avoid having to specify it in the inputs of every command:

project_dir = '/user/name/aggfly_repository'

1. Loading the shapefile and the raster dataset

The first step towards aggregating the climatic dataset is to load the shapefile containing the target administrative regions at the level of which you want to aggregate the climatic data with the georegions_from_path() function:

georegions = af.georegions_from_path(
    "~/data/shapefiles/county/cb_2018_us_county_500k.shp",
    regionid='GEOID'
)

You now load a sample layer of the climatic raster dataset that you want to aggregate with the dataset_from_path() function. This will be used to compute the area weights - see the next paragraph for more details on weights:

# Open example dataset to construct weights
dataset = af.dataset_from_path(
    f"/home3/dth2133/data/annual/tempPrecLand2017.zarr", 
    var = 't2m',
    name = 'era5',
    georegions=georegions,
    preprocess = lambda x: (x - 273.15),
)
dataset.da

Main arguments:

var: The selected variable to transform and aggregate.
preprocess: It is used to specify a function for processing the raw values of your data before they are aggregated. For instance, it can be used to convert degrees Kelvin to degrees Celsius or to shift every osbervation back by one hour.
georegions: The georegions object you have previously created.
name: The name you want to assign to this dataset.

2. Computing the weights

We first start with a brief explanation of why weights are an important component of this aggregation procedure to then show how to compute them and the main options you can choose.

Why are weights important?

There are two categories of weights that you may use to spatially aggregate the climatic data:

Area weights are the standard weights that we need to use, which consider the share of the area of an administrative that falls in a grid cell as the weight assigned to that cell. It is important to compute the weighted average of the climatic data over each administrative region, rather than the unweighted one, for two main reasons. First, the global grid cells have different dimension, since the longitude lines converge at the equator and, hence, the linear distance of longitude is larger at the equator and it converges to zero at the poles. Second, the border of some of our administrative regions may intersect some cells. In the latter case, we want the weight of the intersected cell to be proportional to the area covered by the administrative region.
Secondary weights are useful when we are interested in the average climate experienced by a particular subject. For example, if we are studying the effect of climate on human health, it may be appropriate to weight the climatic data by the number of humans that live in a grid cell. Alternatively, if we are interested in the responses of agricultural productivity to climate change, we may want to use the share of land covered by crops - or a specific crop - to compute the weight of each grid cell weight.

Implementation without secondary weights

This is the standard case, in which area weights are computed from the weights_from_objects without specifying any secondary weights in the options.

# Calculate area weights.
weights = af.weights_from_objects(
    dataset,
    georegions,
    project_dir=project_dir
)
weights.calculate_weights()

Implementation with secondary weights

To calculate weights based on a secondary variable, we first load the secondary variable dataset with one among secondary_weights_from_path, pop_weights_from_path and crop_weights_from_path. Then, we compute the weights through the weights_from_objects specifying secondary weights in the options.

secondary_weights = af.pop_weights_from_path("~/data/population/landscan-global-2016.tif")

# Calculate weights.
weights = af.weights_from_objects(
    dataset,
    georegions,
    secondary_weights=secondary_weights,
    project_dir=project_dir
)
weights.calculate_weights()

weights will now contain the array of weights to be used for the aggregation.

Main arguments:

georegions: The georegions object you have previously created.
dataset: The layer of the dataset that is used to obtain the informations on the structure of the grid in order to compute the weights.
project_dir: The project directory.
secondary_weights: the secondary_weights object you have previously created.

3. Transforming and aggregating

You first load the full dataset that you want to aggregate using the same procedure as in step 1 - when you however loaded just a sample layer of the dataset - and you then finally aggregate it with the aggregate_dataset() function.

dataset = af.dataset_from_path(
    f"~//data/annual/tempPrecLand{year}.zarr", 
    var = 't2m',
    name = 'era5',
    georegions=georegions,
    preprocess = lambda x: (x - 273.15)
)

output_df = af.aggregate_dataset(
    dataset=dataset, 
    weights=weights,
    tavg = [
        ('aggregate', {'calc':'mean', 'groupby':'date'}),
        ('transform', {'transform':'power', 'exp':np.arange(1,2)}),
        ('aggregate', {'calc':'sum', 'groupby':'year'})
    ],
    bins= [
        ('aggregate', {'calc':'mean', 'groupby':'date'}),
        ('aggregate', {'calc':'bins', 'groupby':'year', 'ddargs':[[25,99,0],[30,99,0]]})
    ],
    growing_dday = [
        ('aggregate', {'calc':'dd', 'groupby':'date', 'ddargs':[10,30,0]}),
        ('aggregate', {'calc':'sum', 'groupby':'year'}),
    ],
    heating_dday = [
        ('aggregate', {'calc':'dd', 'groupby':'date', 'ddargs':[-99,20,1]}),
        ('aggregate', {'calc':'sum', 'groupby':'year'}),
    ]
)

Notice that the function will first compute the aggregation across time in the way described by the lists of

Main arguments:

dataset: The complete raster that you have just loaded, which contains the climatic data you want to aggregate.
georegions: The georegions object you have previously created.
weights:

TO COMPLETE agg_dict (dict): A dictionary containing the arguments for creating TemporalAggregator objects. The keys of the dictionary are names, and the values are a list of either tuples or TemporalAggregator objects. If the list contains tuples, use them as arguments to instantiate a temporal aggregator.

Available transformations include:

mean computes the average value of the within the time period specified by groupby.
min computes the minimum value within the time period.
max: computes the maximum value within the time period.
sum: computes the sum over the time period.
dd:
bin:
exp: computes the polynomials of the specified degrees.

For a more detailed application of the aggregation, refer to the example notebook.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Oct 24, 2024

0.1.4

Jun 1, 2024

0.1.3

Jun 1, 2024

This version

0.1.2

Jun 1, 2024

0.1.2a0 pre-release

Jun 1, 2024

0.1.1

Jun 1, 2024

0.1.0

May 31, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aggfly-0.1.2.tar.gz (37.2 kB view details)

Uploaded Jun 1, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aggfly-0.1.2-py3-none-any.whl (43.7 kB view details)

Uploaded Jun 1, 2024 Python 3

File details

Details for the file aggfly-0.1.2.tar.gz.

File metadata

Download URL: aggfly-0.1.2.tar.gz
Upload date: Jun 1, 2024
Size: 37.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.10 Linux/6.5.0-28-generic

File hashes

Hashes for aggfly-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`0be4286abe422a85607b4da869eca7487f77d185d540a692a437736d7e98d34f`
MD5	`0aa49329ded85a7578542d70e7f9341a`
BLAKE2b-256	`3b63c96b76a0121990d0c814c265ce6233736fc1465e16b9c42d5bbf4abfc561`

See more details on using hashes here.

File details

Details for the file aggfly-0.1.2-py3-none-any.whl.

File metadata

Download URL: aggfly-0.1.2-py3-none-any.whl
Upload date: Jun 1, 2024
Size: 43.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.10 Linux/6.5.0-28-generic

File hashes

Hashes for aggfly-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`adcd59e9c7ecbc3f621c696422bc777f97002ad932a1933169f2f5cd041a8ce2`
MD5	`ef6b87de4eccd97fe961673b83c7dc44`
BLAKE2b-256	`8cbe0b62527b8e3d69e1e8bc2d373c8e549a70a857fdc2a6658b28462ece1e46`

See more details on using hashes here.

aggfly 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

`aggfly`

Overview: Why `aggfly`?

Installation

Required dependencies

Instructions

Jupyter

Input datasets

Workflow

1. Loading the shapefile and the raster dataset

2. Computing the weights

Why are weights important?

Implementation without secondary weights

Implementation with secondary weights

3. Transforming and aggregating

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

aggfly 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

aggfly

Overview: Why aggfly?

Installation

Required dependencies

Instructions

Jupyter

Input datasets

Workflow

1. Loading the shapefile and the raster dataset

2. Computing the weights

Why are weights important?

Implementation without secondary weights

Implementation with secondary weights

3. Transforming and aggregating

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`aggfly`

Overview: Why `aggfly`?