A growing toolkit for climate data quality checks, EDA, and analysis.
Project description
Climate Indepth Analysis
climate-indepth-analysis is a Python package for climate data quality checks, exploratory data analysis, and future climate analysis workflows.
This first release focuses on daily station climate data EDA. It checks how clean a station dataset is by creating a full continuous daily calendar for every station before calculating missing values. This is important because many climate station files skip entire dates instead of storing those dates with NaN values.
Current features
- Count the number of climate stations in a file
- Support custom station ID columns such as
STATION,station,ID,station_id, orsite_no - Support custom date columns such as
DATE,date,datetime, ortime - Use default climate variables
PRCP,TMAX, andTMIN - Allow users to choose other variables
- Create a full station-date calendar before missing-data calculation
- Detect missing dates that are absent from the raw file
- Summarize missingness overall, by station, and by month
- Report longest missing streaks for each station and variable
- Provide descriptive statistics for selected variables
- Run in Jupyter or from the command line
- Keep all outputs in memory by default
- Save output files only when the user requests it
Planned direction
This package is designed to grow beyond EDA. Future releases may include tools for climate data download, cleaning, spatial aggregation, trend analysis, drought analysis, precipitation indices, temperature extremes, and visualization.
Installation
After the package is published to PyPI:
pip install climate-indepth-analysis
For local development from this folder:
pip install -e .
Jupyter usage
from climate_indepth_analysis import run_eda
results = run_eda(
input_path="my_climate_data.csv",
station_col="STATION",
date_col="DATE",
needed_cols=["PRCP", "TMAX", "TMIN"],
start_date="1980-01-01",
end_date="2025-12-31",
save_outputs=False,
)
results["overall"]
results["station_summary"].head()
results["monthly"].head()
Save outputs only when needed
By default, the package does not save any file. To save summary tables and a text report, set save_outputs=True.
results = run_eda(
input_path="my_climate_data.csv",
station_col="ID",
date_col="date",
save_outputs=True,
output_dir="my_climate_eda_results",
)
The full station-date calendar can be very large. It is not saved unless you also set save_full_calendar=True.
results = run_eda(
input_path="my_climate_data.csv",
save_outputs=True,
output_dir="my_climate_eda_results",
save_full_calendar=True,
)
Command-line usage
Run without saving files:
climate-indepth-analysis --input my_climate_data.csv
Run with custom station and date columns:
climate-indepth-analysis --input my_climate_data.csv --station-col ID --date-col date
Save output files to a custom folder:
climate-indepth-analysis --input my_climate_data.csv --save-outputs --output-dir my_climate_eda_results
You can also use the shorter command:
climate-eda --input my_climate_data.csv
Default settings
station_col = "STATION"
date_col = "DATE"
needed_cols = ["PRCP", "TMAX", "TMIN"]
start_date = "1980-01-01"
end_date = "2025-12-31"
save_outputs = False
output_dir = "climate_indepth_analysis_output"
Output dictionary
run_eda() returns a dictionary with these tables:
results["diagnostics"]
results["inventory"]
results["overall"]
results["station_summary"]
results["monthly"]
results["variable_stats"]
results["full_calendar"]
Files saved when save_outputs=True
00_cleanliness_report.txt
01_file_calendar_diagnostics.csv
02_station_inventory_coverage.csv
03_overall_missing_summary.csv
04_station_missing_summary.csv
05_monthly_missing_summary_long.csv
06_variable_descriptive_stats.csv
If save_full_calendar=True, the package also saves:
07_full_station_date_calendar.parquet
If Parquet support is unavailable, it falls back to CSV.
Build and upload to PyPI
Install build tools:
python -m pip install --upgrade build twine
Build the package:
python -m build
Check the package:
python -m twine check dist/*
Upload to TestPyPI first:
python -m twine upload --repository testpypi dist/*
Upload to PyPI:
python -m twine upload dist/*
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file climate_indepth_analysis-0.1.0.tar.gz.
File metadata
- Download URL: climate_indepth_analysis-0.1.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0743fc6519ee8ed5fcf0bf0cec9adf6c893e9d7e50d2432b52f0aee61959c280
|
|
| MD5 |
8ffe8ae7f5b40704adc150767b54e5dc
|
|
| BLAKE2b-256 |
b6135bcb3e6f5ffdb1857184bdaf95652a54b391eaca6768809b4ca44519672d
|
File details
Details for the file climate_indepth_analysis-0.1.0-py3-none-any.whl.
File metadata
- Download URL: climate_indepth_analysis-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d92cf424f19a9cdc2aea922dd35afa63c6b2b19e3c2c18b16a7ee3aace17ae5
|
|
| MD5 |
50d3c0bd1dd66296d176632fa41bd106
|
|
| BLAKE2b-256 |
d16fa6843391cdbac2789ad86947f3c373fa7815d5ed886744c1daea520452f3
|