Skip to main content

Temporal disaggregation of daily precipitation into hourly using Q-CODA.

Project description

pyqcoda

pyqcoda is a Python library for temporal disaggregation of daily precipitation into hourly time series using a combination of comonotonicity transformation and an iterative adjusted k-nearest neighbors (KNN) algorithm. It is tailored for hydrological and climate data processing tasks where hourly data is required but only daily observations are available.


🌧️ Overview

  • Input:

    • train_data.csv: Hourly precipitation data with columns datetime (hourly resolution) and precipitation (mm).
    • test_data.csv: Daily precipitation data with the same column names but daily resolution (datetime at 00:00:00 for each day).
    • (Optional) params.csv: Parameters for the semi-parametric Bernoulli-Gamma mode.
    • (Optional) seasons.csv: User-defined climatological seasons because default seasons are DJF, MAM, JJA, SON.
  • Output:

    • A pandas DataFrame (or CSV) with hourly precipitation disaggregated from the daily values in test_data, using statistical patterns learned from train_data.

✨ Features

  • Disaggregates daily totals into 24-hour precipitation series.
  • Preserves sub-daily maxima in reconstructed data.
  • Season-aware (DJF, MAM, JJA, SON) to capture seasonal variability.
  • Combines comonotonicity with KNN-based iterative adjustments.
  • Suitable for hydrological modeling and climate studies.
  • Optional semi-parametric Bernoulli-Gamma mode.
  • Optional enhanced autocorrelation refinement via permutations.

📦 Installation

From PyPI (recommended)

pip install pyqcoda

From Github

git clone https://github.com/carloscorreag/pyqcoda.git
cd pyqcoda
pip install .

🚀 Usage examples

🔹 1. Standard mode (default)

import pandas as pd
from pyqcoda import pyqcoda

# 1. Load your training (hourly) and testing (daily) datasets
df_train = pd.read_csv("train_data.csv", index_col=0, parse_dates=True)
df_test = pd.read_csv("test_data.csv", index_col=0, parse_dates=True)

# 2. Instantiate pyqcoda and disaggregate
qc = pyqcoda()
simulated_series = qc.disaggregate(df_train, df_test)

# 3. Convert results to hourly DataFrame
df_hourly = qc.get_hourly_dataframe(simulated_series)

# 4. Save output
df_hourly.to_csv("disaggregated_output.csv")
print("Hourly disaggregated precipitation saved to disaggregated_output.csv")

🔹 2. Semi-parametric mode (Load Bernoulli-Gamma params with CSV)

This mode uses fitted Bernoulli-Gamma distributions instead of the empirical transformation.

import pandas as pd
from pyqcoda import pyqcoda

params_df = pd.read_csv("params.csv")

# Convert to dictionary required by pyqcoda
params = {}
for _, row in params_df.iterrows():
    season = row["season"]
    duration = int(row["duration"])

    params.setdefault(season, {})
    params[season][duration] = {
        "p0": row["p0"],
        "shape": row["shape"],
        "scale": row["scale"]
    }

qc = pyqcoda()
simulated_series = qc.disaggregate(
    df_train,
    df_test,
    semi_parametrical_mode=params
)

df_hourly = qc.get_hourly_dataframe(simulated_series)

df_hourly.to_csv("disaggregated_output.csv")
print("Hourly disaggregated precipitation saved to disaggregated_output.csv")

📄 Format of params.csv

The file must contain one row per:

season (DJF, MAM, JJA, SON) duration (1, 2, 6, 12, 24)

Example params.csv

season,duration,p0,shape,scale
DJF,24,0.3,2.1,5.0
DJF,1,0.5,1.2,2.0
DJF,2,0.45,1.5,2.5
DJF,6,0.4,2.0,3.0
DJF,12,0.35,2.3,4.0
MAM,24,0.25,2.5,4.5
MAM,1,0.4,1.8,2.2

🔹 3. Custom seasons (user-defined climatological seasons)

By default, pyqcoda uses standard climatological seasons:

  • DJF (Dec–Jan–Feb)
  • MAM (Mar–Apr–May)
  • JJA (Jun–Jul–Aug)
  • SON (Sep–Oct–Nov)

However, users can define custom seasonal partitions using a CSV file, in a way fully consistent with the params.csv workflow.

import pandas as pd
from pyqcoda import pyqcoda

seasons_df = pd.read_csv("seasons.csv")

# Seasons mapping
seasons = {}
for _, row in seasons_df.iterrows():
    season = row["season"]
    month = int(row["month"])

    seasons.setdefault(season, []).append(month)


qc = pyqcoda()
simulated_series = qc.disaggregate(
    df_train,
    df_test,
    seasons_dict=seasons  
)

df_hourly = qc.get_hourly_dataframe(simulated_series)

df_hourly.to_csv("disaggregated_output.csv")
print("Hourly disaggregated precipitation saved to disaggregated_output.csv")

📄 seasons.csv format

The file must define a mapping between:

  • season → custom season name
  • month → month number (1–12)

Each month must belong to exactly one season.

Example

season,month
WET,10
WET,11
WET,12
WET,1
WET,2
WET,3
DRY,4
DRY,5
DRY,6
DRY,7
DRY,8
DRY,9
  • All 12 months (1–12) must be assigned exactly once.
  • Season names in seasons.csv must match those used in params.csv if using semi-parametric mode. For example, if user define WET and DRY seasons, params.csv must contain them:
season,duration,p0,shape,scale
WET,24,0.5,2.1,5.0
WET,1,0.5,1.2,2.0
WET,2,0.5,1.5,2.5
WET,6,0.5,2.0,3.0
WET,12,0.5,2.3,4.0
DRY,24,0.25,2.5,4.5
DRY,1,0.25,1.3,2.5
DRY,2,0.25,1.7,3
DRY,6,0.25,1.9,4
DRY,12,0.25,2.2,4.2
  • Overlapping or missing months will raise an error.
  • This feature is fully optional: if seasons_dict=None, default climatological seasons are used.

🔹 4. Enhanced autocorrelation refinement (permutations mode)

pyqcoda includes an optional advanced refinement step designed to improve the temporal structure of the reconstructed hourly precipitation series, specifically targeting lag-1 autocorrelation.

This mode applies a local permutation-based optimization over short hourly windows while preserving:

  • Daily totals (P24)
  • Sub-daily maxima constraints (PMAX1H, PMAX2H, PMAX6H, PMAX12H)
  • Physical consistency rules

What this mode does

When enabled, the algorithm:

  1. Selects short rolling windows (typically 3–5 hours)
  2. Generates permutations of values within each window
  3. Evaluates each candidate series using:
    • Sub-daily maxima preservation
    • Constraint consistency
    • Lag-1 autocorrelation improvement
  4. Keeps the configuration that maximizes temporal coherence

How to use

Enable the mode by setting use_permutations=True in disaggregate:

from pyqcoda import pyqcoda

qc = pyqcoda()

simulated_series = qc.disaggregate(
    df_train,
    df_test,
    use_permutations=True
)

df_hourly = qc.get_hourly_dataframe(simulated_series)

df_hourly.to_csv("disaggregated_output.csv")
print("Hourly disaggregated precipitation saved to disaggregated_output.csv")

🔧 Requirements

  • Python 3.7+
  • pandas ≥ 1.2.4
  • numpy ≥ 1.21.6
  • scikit-learn ≥ 1.0.2

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.


📖 Citation

Correa Guinea, C. (2025). pyqcoda: Temporal disaggregation of daily precipitation into hourly using Q-CODA. DOI:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyqcoda-1.0.4.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyqcoda-1.0.4-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file pyqcoda-1.0.4.tar.gz.

File metadata

  • Download URL: pyqcoda-1.0.4.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for pyqcoda-1.0.4.tar.gz
Algorithm Hash digest
SHA256 80927a498b7c4ec9bc1cbd7f70b85d972a3a83600a8fe8d506dc831249666303
MD5 f132cf75e77c29b82ed003c0ffdbbd90
BLAKE2b-256 9a26acc08dad968ba3141c940d7a0d88d637631bcedc51c41a690fdff2384397

See more details on using hashes here.

File details

Details for the file pyqcoda-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: pyqcoda-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for pyqcoda-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 75f2eaec749b9a141b0711607d4c8b98c92d5bef91bc9c4bcf8ecd070756de21
MD5 dfa8adb00060637b462e77ea1c9c29b4
BLAKE2b-256 4c76d1a21b97ab942ce2d37465bd5843793184b038892181647d667a607ea22d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page