Skip to main content

A growing toolkit for climate data quality checks, EDA, and analysis.

Project description

Climate Indepth Analysis

climate-indepth-analysis is a Python package for climate data quality checks, exploratory data analysis, and future climate analysis workflows.

This first release focuses on daily station climate data EDA. It checks how clean a station dataset is by creating a full continuous daily calendar for every station before calculating missing values. This is important because many climate station files skip entire dates instead of storing those dates with NaN values.

Current features

  • Count the number of climate stations in a file
  • Support custom station ID columns such as STATION, station, ID, station_id, or site_no
  • Support custom date columns such as DATE, date, datetime, or time
  • Use default climate variables PRCP, TMAX, and TMIN
  • Allow users to choose other variables
  • Create a full station-date calendar before missing-data calculation
  • Detect missing dates that are absent from the raw file
  • Summarize missingness overall, by station, and by month
  • Report longest missing streaks for each station and variable
  • Provide descriptive statistics for selected variables
  • Run in Jupyter or from the command line
  • Keep all outputs in memory by default
  • Save output files only when the user requests it

Planned direction

This package is designed to grow beyond EDA. Future releases may include tools for climate data download, cleaning, spatial aggregation, trend analysis, drought analysis, precipitation indices, temperature extremes, and visualization.

Installation

After the package is published to PyPI:

pip install climate-indepth-analysis

For local development from this folder:

pip install -e .

Jupyter usage

from climate_indepth_analysis import run_eda

results = run_eda(
    input_path="my_climate_data.csv",
    station_col="STATION",
    date_col="DATE",
    needed_cols=["PRCP", "TMAX", "TMIN"],
    start_date="1980-01-01",
    end_date="2025-12-31",
    save_outputs=False,
)

results["overall"]
results["station_summary"].head()
results["monthly"].head()

Save outputs only when needed

By default, the package does not save any file. To save summary tables and a text report, set save_outputs=True.

results = run_eda(
    input_path="my_climate_data.csv",
    station_col="ID",
    date_col="date",
    save_outputs=True,
    output_dir="my_climate_eda_results",
)

The full station-date calendar can be very large. It is not saved unless you also set save_full_calendar=True.

results = run_eda(
    input_path="my_climate_data.csv",
    save_outputs=True,
    output_dir="my_climate_eda_results",
    save_full_calendar=True,
)

Command-line usage

Run without saving files:

climate-indepth-analysis --input my_climate_data.csv

Run with custom station and date columns:

climate-indepth-analysis --input my_climate_data.csv --station-col ID --date-col date

Save output files to a custom folder:

climate-indepth-analysis --input my_climate_data.csv --save-outputs --output-dir my_climate_eda_results

You can also use the shorter command:

climate-eda --input my_climate_data.csv

Default settings

station_col = "STATION"
date_col = "DATE"
needed_cols = ["PRCP", "TMAX", "TMIN"]
start_date = "1980-01-01"
end_date = "2025-12-31"
save_outputs = False
output_dir = "climate_indepth_analysis_output"

Output dictionary

run_eda() returns a dictionary with these tables:

results["diagnostics"]
results["inventory"]
results["overall"]
results["station_summary"]
results["monthly"]
results["variable_stats"]
results["full_calendar"]

Files saved when save_outputs=True

00_cleanliness_report.txt
01_file_calendar_diagnostics.csv
02_station_inventory_coverage.csv
03_overall_missing_summary.csv
04_station_missing_summary.csv
05_monthly_missing_summary_long.csv
06_variable_descriptive_stats.csv

If save_full_calendar=True, the package also saves:

07_full_station_date_calendar.parquet

If Parquet support is unavailable, it falls back to CSV.

Build and upload to PyPI

Install build tools:

python -m pip install --upgrade build twine

Build the package:

python -m build

Check the package:

python -m twine check dist/*

Upload to TestPyPI first:

python -m twine upload --repository testpypi dist/*

Upload to PyPI:

python -m twine upload dist/*

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

climate_indepth_analysis-0.1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

climate_indepth_analysis-0.1.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file climate_indepth_analysis-0.1.0.tar.gz.

File metadata

  • Download URL: climate_indepth_analysis-0.1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for climate_indepth_analysis-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0743fc6519ee8ed5fcf0bf0cec9adf6c893e9d7e50d2432b52f0aee61959c280
MD5 8ffe8ae7f5b40704adc150767b54e5dc
BLAKE2b-256 b6135bcb3e6f5ffdb1857184bdaf95652a54b391eaca6768809b4ca44519672d

See more details on using hashes here.

File details

Details for the file climate_indepth_analysis-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for climate_indepth_analysis-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d92cf424f19a9cdc2aea922dd35afa63c6b2b19e3c2c18b16a7ee3aace17ae5
MD5 50d3c0bd1dd66296d176632fa41bd106
BLAKE2b-256 d16fa6843391cdbac2789ad86947f3c373fa7815d5ed886744c1daea520452f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page