Python package for Google's Groundsource flash flood dataset — 2.6M events, 150+ countries, 2000–2026
Project description
groundsource
Python package for Google's Groundsource flash flood dataset.
Google used Gemini to extract 2.6 million flash flood events from news articles across 150+ countries (2000-2026). The raw data is a 667MB Parquet file with undocumented WKB geometries and no location labels. This package decodes the geometries, tags every event with country and continent, and provides a clean search and analysis API.
from groundsource import FloodDB
db = FloodDB() # auto-downloads + enriches on first run
floods = db.search(country="India", year_range=(2020, 2025))
Installation
pip install groundsource
Requirements: Python 3.9+, pandas, pyarrow, geopandas, shapely, matplotlib
On first run, the package downloads the dataset from Zenodo (~667MB), decodes 2.6M WKB polygons, and performs a spatial join against Natural Earth boundaries. This takes 2-3 minutes and is cached locally for instant subsequent loads.
Usage
Search
from groundsource import FloodDB
db = FloodDB()
# By country (supports common aliases: "USA", "UK", "UAE", etc.)
db.search(country="India")
db.search(country="USA", year_range=(2020, 2025))
# By city (98 major cities built-in, default 100km radius)
db.search(city="Houston", radius_km=50)
# By continent or bounding box
db.search(continent="Asia")
db.search(bbox=[0, 95, 25, 120]) # [min_lat, min_lon, max_lat, max_lon]
Trend Analysis
db.trend(country="India") # yearly event counts
db.growth(country="India") # growth rate between two periods
db.compare(["USA", "UK", "India", "Indonesia"]) # side-by-side comparison
db.top_countries(20) # ranked by total events
db.country_growth_ranking(20) # ranked by growth acceleration
db.bias_check() # global yearly counts for bias analysis
Built-in Charts
db.plot_hockey_stick(save_path="hockey_stick.png")
db.plot_bias(save_path="bias.png")
db.plot_top_countries(save_path="top_countries.png")
db.plot_country_growth(save_path="growth.png")
Raw DataFrame Access
df = db.to_dataframe()
# Columns: uuid, area_km2, start_date, end_date, centroid_lon, centroid_lat,
# country, iso_a3, continent, year
What This Package Does
The raw Parquet from Zenodo has 5 columns with no documentation:
| Raw Column | Type | Issue |
|---|---|---|
uuid |
string | ID only |
area_km2 |
float | Usable as-is |
geometry |
WKB binary | Requires shapely to decode |
start_date |
string | Not parsed as datetime |
end_date |
string | Not parsed as datetime |
This package enriches each event with:
| Added Column | Source |
|---|---|
centroid_lon, centroid_lat |
Decoded from WKB polygons |
country, iso_a3 |
Spatial join against Natural Earth |
continent |
Natural Earth |
year |
Extracted from start_date |
Reporting Bias
The dataset shows 498 events in 2000 and 402,012 in 2024. This does not mean floods increased 807x. The data is extracted from news articles, and digital news coverage grew dramatically over this period. Any trend analysis should account for this reporting bias. Use db.bias_check() and db.plot_bias() to visualize this.
Top Countries by Events Detected
Dataset
- Source: Google Groundsource
- Download: Zenodo (CC BY 4.0)
- Records: 2,646,302 events across 175 countries, 2000-2026
- Method: Gemini parsed ~5M news articles
- Accuracy: 60% location+timing, 82% practically useful (per Google)
License
MIT. The underlying dataset is licensed CC BY 4.0 by Google.
Citation
Google Research. Groundsource: Turning News Reports into Data with Gemini. Zenodo, 2026. DOI: 10.5281/zenodo.18647054
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file groundsource-0.1.1.tar.gz.
File metadata
- Download URL: groundsource-0.1.1.tar.gz
- Upload date:
- Size: 223.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e56d2fb96c1127ff60281a83939ba264db686f5e9bf4559be1ee6d2f3c11006b
|
|
| MD5 |
ae5b182c1898cde6906fdc4d7718105a
|
|
| BLAKE2b-256 |
ca6a86d06b24cf1d137d69b38a8c26b5497b034718ddfae010a1a916e937a35f
|
File details
Details for the file groundsource-0.1.1-py3-none-any.whl.
File metadata
- Download URL: groundsource-0.1.1-py3-none-any.whl
- Upload date:
- Size: 221.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9feecc67c108107a110718b4880ffef8a7fd2c09de105d354f79f00df6faed4c
|
|
| MD5 |
0bce90a221c7439a637292354d17ab72
|
|
| BLAKE2b-256 |
5958f2cfbce5bf09814df67761d502df4fcb746834312c513e9ce5f784030557
|