A Python package for quality control (QC) checks on BSRN station-to-archive files.
Project description
bsrn
This GitHub repository is dazhiyang/bsrn: the source code and development tooling for the bsrn Python package.
bsrn is a community-developed toolbox that provides a set of robust functions and classes for processing and analyzing solar radiation data. The core mission of bsrn is to provide an open, reliable, interoperable, and benchmark-standard set of tools tailored specifically for the Baseline Surface Radiation Network (BSRN).
It features automated quality control (QC), high-precision solar geometry, clear-sky modeling, clear-sky detection (CSD), cloud enhancement event (CEE) detection, irradiance separation, and comprehensive data retrieval and visualization capabilities.
๐ Documentation (Read the Docs)
๐ Getting Started
Installation
The core bsrn package is designed to be lightweight and fast. You can install it using pip:
From PyPI (stable release):
pip install bsrn
From GitHub (latest development version):
pip install git+https://github.com/dazhiyang/bsrn.git
Optional Visualization Tools
If you want to use the built-in plotting features (like data availability charts or clear-sky calendars), you will need to install the optional visualization dependencies (plotnine, matplotlib, and scipy):
pip install bsrn[viz]
Usage
For standard quality control and clear-sky modeling, simply import the base package:
import bsrn
# Access core modules like bsrn.qc, bsrn.modeling, bsrn.io, bsrn.archive
If you installed the [viz] extra and want to generate plots, you must explicitly import the visualization submodule:
import bsrn.visualization
# Access plotting tools like bsrn.visualization.calendar.plot_calendar()
Quick Example โ BSRNDataset (recommended)
import bsrn
# One station-month from a BSRN LR0100 archive (.dat.gz)
ds = bsrn.BSRNDataset.from_file("data/QIQ/qiq0125.dat.gz")
# Typical pipeline (each step mutates the cached frame and returns it)
ds.solpos() # solar geometry + extraterrestrial
ds.clear_sky(model="rest2") # ghi_clear / bni_clear / โฆ (REST2; MERRA-2 via Hugging Face)
ds.qc_test() # flag columns: 0 = pass, 1 = fail
ds.qc_mask() # NaN failed irradiance; drop flag columns
df = ds.data() # minute-resolution table for analysis or export
# Visualize directly from the dataset (requires bsrn[viz])
ds.plot.daily("2025-01-15") # UTC date inside the loaded month
ds.plot.table() # QC summary table
Quick Example โ Functional API
The same steps are available as standalone functions, useful for non-BSRN data or custom DataFrames:
from bsrn.io.retrieval import download_bsrn_stn, get_bsrn_file_inventory
from bsrn.physics.geometry import add_solpos_columns
from bsrn.modeling.clear_sky import add_clearsky_columns
from bsrn.qc.wrapper import run_qc
# 1. See what data is available
inventory = get_bsrn_file_inventory(["QIQ"], username="your_user", password="your_pass")
# 2. Download data for a station
download_bsrn_stn("QIQ", "data/QIQ", username="your_user", password="your_pass")
# 3. Read via BSRNDataset and get the DataFrame
ds = bsrn.BSRNDataset.from_file("data/QIQ/qiq0125.dat.gz")
df = ds.data()
# 4. Add solar position (recommended before time-averaging or clear-sky)
df = add_solpos_columns(df, "QIQ")
# 5. Add clear-sky reference columns (defaults to Ineichen)
df = add_clearsky_columns(df, "QIQ")
# 6. Run Quality Control (QC)
df = run_qc(df, "QIQ")
# 7. Add satellite-derived CAMS CRS all-sky columns
from bsrn.io.crs import add_crs_columns
df = add_crs_columns(df, "QIQ")
# 8. Visualize with plotnine
from bsrn.visualization.clearsky_models import plot_clearsky_models
plot_clearsky_models(df, "QIQ", date="2024-06-20", save_path="clearsky_qiq.pdf")
๐ Features
The QC features, of which the implementation is primarily based on the BSRN Operations Manual (2018) and Forstinger et al. (2021). See code for other references.
- Level 1 (Physically Possible): Absolute physical bounds for $G_h, B_n, D_h$, and $L_d$.
- Level 2 (Extremely Rare): Climatological limits for specific regimes.
- Level 3 (Comparison): Consistency checks ($G_h$ vs $B_n \cos Z + D_h$) with zenith-dependent thresholds.
- Level 4 (Diffuse Ratio): Diffuse-fraction and $k$โ$k_t$ checks combining $G_h$, $D_h$, and extraterrestrial irradiance.
- Level 5 (K-Indices): Advanced clearness-index and $k_b$/$k_t$ index tests using clear-sky benchmarks and site elevation.
- Level 6 (Tracker-Off Detection): Identify tracking errors by comparing measured values with clear-sky and extraterrestrial irradiance.
Other important features include:
- Solar Geometry: Native NREL SPA implementation for high-precision solar position calculations.
- Clear-Sky Models: Ineichen (monthly Linke turbidity), McClear (CAMS SoDa API, from 2004 onward), and REST2 (MERRA-2 from Hugging Face).
- Satellite Data: Load CAMS solar radiation service (CRS) and National Solar Radiation Database (NSRDB) all-sky irradiance directly from Hugging Face into memory.
- Clear-Sky Detection (CSD): Reno, Ineichen, Lefevre, and BrightSun methods to identify clear-sky periods from irradiance time series.
- Cloud Enhancement Event (CEE) Detection: Killinger, Yang, and Gueymard methods to detect events when measured GHI significantly exceeds references.
- Irradiance Separation: Erbs, BRL, Engerer2, and Yang4 models to estimate diffuse fraction and DHI/BNI from GHI.
- Robust Retrieval: High-level API for FTP downloads from BSRN-AWI with exponential backoff retries (analysis functions assume one station-to-archive file at a time).
- Station-to-archive formatting: The
bsrn.archivesubpackage providesLR_SPECS, Fortran-style validators invalidation.py(names referenced by each fieldโsvalidate_func), and ASCII output viaget_bsrn_format. Scalar/header fields on the PydanticLR*models use a singlelr_spec(lr_code, field_name, type, โฆ)annotation so metadata and post-parse checks stay in one place; LR0100/LR4000 minute columns usefield_validatorwithyearMonthfor vector length checks. Concrete types (LR0001โLR4000CONST) live inrecords_modelsand are re-exported frombsrn.archive;get_azimuth_elevationis inarchive_lr_formats(also re-exported). - Visualization: Data availability heatmaps and k vs kt separation plots via the very pretty
plotnine(which reminds me of the good old R days).
๐ File Structure
[!NOTE] Not all files are uploaded with Git. Data files and intermediate outputs are excluded via
.gitignore.
bsrn-qc/
โโโ pyproject.toml
โโโ LICENSE
โโโ README.md
โโโ .gitignore
โโโ .readthedocs.yaml # Read the Docs build config
โโโ src/
โ โโโ bsrn/
โ โโโ __init__.py
โ โโโ dataset.py # BSRNDataset: central monthly data object + pipeline methods
โ โโโ constants.py # Station database, Linke turbidity & physical constants
โ โโโ archive/ # Station-to-archive logical records (WRMC-style LR layouts)
โ โ โโโ __init__.py # Re-exports LR* models, LR_SPECS, get_azimuth_elevation, โฆ
โ โ โโโ specs.py # LR_SPECS + station directory & A3โA7 code tables
โ โ โโโ archive_lr_formats.py # get_bsrn_format + get_azimuth_elevation (LR0004)
โ โ โโโ records_base.py # ArchiveRecordBase, make_archive_after_validator
โ โ โโโ records_models.py # lr_spec / lr_spec_field; LR0001โLR4000CONST Pydantic models
โ โ โโโ formatting.py # Fortran-style field formatting mixin
โ โ โโโ validation.py # BSRN archive field validators (LR_SPECS validate_func)
โ โโโ io/
โ โ โโโ reader.py # Read xxxmmyy.dat.gz station-to-archive files
โ โ โโโ retrieval.py # FTP downloads with retries
โ โ โโโ merra2.py # MERRA-2 parquet fetch (Hugging Face โ RAM)
โ โ โโโ mcclear.py # SoDa McClear client helpers
โ โ โโโ crs.py # SoDa CAMS solar radiation service (CRS) client helpers
โ โ โโโ nsrdb.py # NREL NSRDB all-sky data client helpers
โ โ โโโ writers.py # Export results
โ โโโ physics/
โ โ โโโ spa.py # Native NREL SPA (solar position algorithm)
โ โ โโโ geometry.py # Solar position and extraterrestrial irradiance
โ โโโ qc/
โ โ โโโ ppl.py # Physically possible limits (Level 1)
โ โ โโโ erl.py # Extremely rare limits (Level 2)
โ โ โโโ closure.py # Internal consistency checks (Level 3)
โ โ โโโ diff_ratio.py # Diffuse ratio checks (Level 4)
โ โ โโโ k_index.py # Radiometric index tests (Level 5)
โ โ โโโ tracker.py # Solar tracker off detection (Level 6)
โ โ โโโ wrapper.py # High-level QC pipeline
โ โโโ visualization/
โ โ โโโ availability.py # File coverage heatmaps (plotnine)
โ โ โโโ qc_table.py # QC result tables
โ โ โโโ separation.py # Decomposition visualization
โ โ โโโ timeseries.py # Time series plots
โ โโโ utils/
โ โ โโโ calculations.py # Supporting math
โ โ โโโ quality.py # Quality utilities
โ โ โโโ clear_sky_detection.py # Clear-sky detection (Reno, Ineichen, Lefevre, BrightSun)
โ โ โโโ cee_detection.py # Cloud enhancement detection (Killinger, Yang, Gueymard)
โ โโโ modeling/
โ โโโ clear_sky.py # Ineichen clear-sky model
โ โโโ separation.py # Irradiance separation (Erbs, BRL, Engerer2, Yang4)
โโโ docs/
โ โโโ conf.py # Sphinx config; source dir = docs/ (tutorials + sphinx/ RST)
โ โโโ index.rst # Site homepage (root index.html for Read the Docs)
โ โโโ requirements.txt # Sphinx / Read the Docs dependencies
โ โโโ examples/ # Examples landing page (index.rst) + optional scripts
โ โ โโโ index.rst
โ โโโ tutorials/ # Jupyter tutorials + index.rst (nbsphinx)
โ โ โโโ 1.data_downloading.ipynb
โ โ โโโ 2.quality_control.ipynb
โ โ โโโ 3.time_averaging.ipynb
โ โ โโโ 4.clear_sky_detection.ipynb
โ โ โโโ 5.cloud_enhancement_event.ipynb
โ โโโ sphinx/ # RST (user_guide, api, _static); not the doc homepage
โ โโโ api/ # API reference (io, qc, physics, โฆ)
โ โโโ user_guide/ # installation, getting_started, package_overview, โฆ
๐ Examples
Solar Position
import pandas as pd
from bsrn.physics.geometry import get_solar_position, get_bni_extra
times = pd.date_range("2024-07-01", periods=1440, freq="1min", tz="UTC")
solpos = get_solar_position(times, lat=47.80, lon=124.49, elev=170)
print(solpos[["zenith", "apparent_zenith", "azimuth"]].head())
Extraterrestrial Irradiance
from bsrn.physics.geometry import get_bni_extra
bni_extra = get_bni_extra(times) # Spencer (1971) method
Clear-Sky GHI (Ineichen)
from bsrn.modeling.clear_sky import add_clearsky_columns
# Automatically computes solar geometry if missing, but it is highly
# recommended to call `add_solpos_columns(df)` first for 1-minute data!
df = add_clearsky_columns(df, "QIQ")
# Adds columns: ghi_clear, bni_clear, dhi_clear
Clear-Sky GHI from McClear (CAMS)
from bsrn.modeling.clear_sky import add_clearsky_columns
# McClear data are available from 2004-01-01 onward.
# McClear ๆฐๆฎ่ช 2004-01-01 ่ตทๅฏ็จใ
df = add_clearsky_columns(
df,
station_code="QIQ",
model="mcclear",
mcclear_email="your_email@example.com", # SoDa / CAMS account email
)
# Adds columns: ghi_clear, bni_clear, dhi_clear based on CAMS McClear
Clear-Sky GHI from REST2 (MERRA-2 via Hugging Face)
REST2 uses MERRA-2 atmospheric inputs only from the Hugging Face dataset dazhiyang/bsrn-merra2 (hourly Parquet files per station, station_code/*.parquet). The bsrn package fetches them into RAM (no disk cache) when model="rest2" is used.
from bsrn.modeling.clear_sky import add_clearsky_columns
# MERRA-2 is fetched from Hugging Face into RAM automatically.
df = add_clearsky_columns(df, station_code="QIQ", model="rest2")
# Adds columns: ghi_clear, bni_clear, dhi_clear based on REST2 + MERRA-2
The dataset README for Hugging Face is maintained in this repo at data/bsrn_static_assets/README.md (published to the Hub separately from PyPI).
All-Sky GHI from NSRDB (NREL via Hugging Face)
Similar to REST2, NSRDB all-sky data is fetched directly from the Hugging Face dataset dazhiyang/bsrn-nsrdb-conus (and other variants).
from bsrn.io.nsrdb import add_nsrdb_columns
# Fetch NSRDB all-sky GHI/DNI/DHI from Hugging Face
df = add_nsrdb_columns(df, station_code="QIQ", variant="conus")
# Adds columns: ghi_nsrdb, bni_nsrdb, dhi_nsrdb
Clear-Sky Detection
from bsrn.utils import detect_clearsky
# Requires GHI and clear-sky GHI (e.g. from add_clearsky_columns)
out = detect_clearsky("reno", ghi=df["ghi"], ghi_clear=df["ghi_clear"], times=df.index)
# out["is_clearsky"] is True/False/NA; out["cloud_flag"] is 0/1/NaN
# Other methods: "ineichen", "lefevre", "brightsun" (different inputs)
Cloud Enhancement Event (CEE) Detection
from bsrn.utils.cee_detection import detect_cee
# Killinger CEE detection: requires 1โmin GHI, clear-sky GHI, zenith, and a 1โmin index
out_cee_k = detect_cee(
"killinger",
ghi=df["ghi"],
ghi_clear=df["ghi_clear"],
zenith=df["zenith"],
times=df.index,
)
# out_cee_*["is_enhancement"] is True/False/NA; out_cee_*["cee_flag"] is 0/1/NaN
Data Availability Heatmap
from bsrn.visualization.availability import plot_bsrn_availability
fig = plot_bsrn_availability(inventory_df, station_code="QIQ")
fig.save("availability.png", dpi=300)
Station-to-archive logical records (bsrn.archive)
Logical records are Pydantic v2 models (LR0001, โฆ, LR0100, LR4000, LR4000CONST, โฆ) defined in records_models and re-exported from bsrn.archive. The legacy umbrella type BSRNRecord is removedโuse a concrete LR* model and call get_bsrn_format on the instance.
LR_SPECSholds per-fieldformat, missing tokens, defaults, andvalidate_funcnames.- Scalars: validation runs through Pydantic
AfterValidator, which calls the matching function inbsrn.archive.validation. - LR0100 / LR4000 minute vectors: validators need
yearMonth; those columns use a model-levelfield_validatorinstead.
from bsrn.archive import LR0001, LR_SPECS
# Required keys for LR0001 are listed in LR_SPECS["LR0001"]
out = LR0001(stationNumber=94, month=1, year=2024, version=1).get_bsrn_format()
For minute blocks, pass yearMonth="YYYY-MM" and pandas.Series or numpy.ndarray per column (see LR_SPECS["LR0100"] / ["LR4000"]), then LR0100(...).get_bsrn_format(changed=True) (and similarly for LR4000).
Regression check (repository checkout): from the repo root, generate a monthly .dat and compare to the checked-in reference (should match byte-for-byte):
PYTHONPATH=src python tests/2025-01/Code/2.station_to_archive.py \
-o tests/2025-01/Output/qiq0125_run.dat --no-gzip
cmp tests/2025-01/Output/qiq0125_run.dat tests/2025-01/Output/qiq0125_ref.dat
Edit the CONFIG block at the top of 2.station_to_archive.py for station-specific paths and metadata; the script expects the minute table at tests/2025-01/Output/qiq0125.txt for the default QIQ January 2025 example.
๐ License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bsrn-0.2.1.tar.gz.
File metadata
- Download URL: bsrn-0.2.1.tar.gz
- Upload date:
- Size: 174.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e53900392aeb0493cea356cc0b0e2b3c2504754e2fac5fb252b0264584c7a4b4
|
|
| MD5 |
f9ab562dcaef5869d6dba8ee492f4569
|
|
| BLAKE2b-256 |
74e634ff553ad1c8a4e504f1a1a4c6763f4d71ad427192884be9f2a2d3f47506
|
File details
Details for the file bsrn-0.2.1-py3-none-any.whl.
File metadata
- Download URL: bsrn-0.2.1-py3-none-any.whl
- Upload date:
- Size: 193.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d1fc3e23120266fd683953312ca721a960eaf5549090898c9b77fdc0cda7170
|
|
| MD5 |
341ed414417143cbaca1da4df2e2eb86
|
|
| BLAKE2b-256 |
0db1fe4a454130b421b9e58687438bf6b1358fdab65c561cd3870f94a170e8f8
|