Skip to main content

Python package for Google's Groundsource flash flood dataset — 2.6M events, 150+ countries, 2000–2026

Project description

groundsource

Python package for Google's Groundsource flash flood dataset.

Google used Gemini to extract 2.6 million flash flood events from news articles across 150+ countries (2000-2026). The raw data is a 667MB Parquet file with undocumented WKB geometries and no location labels. This package decodes the geometries, tags every event with country and continent, and provides a clean search and analysis API.

from groundsource import FloodDB

db = FloodDB()  # auto-downloads + enriches on first run
floods = db.search(country="India", year_range=(2020, 2025))

Installation

pip install groundsource

Requirements: Python 3.9+, pandas, pyarrow, geopandas, shapely, matplotlib

On first run, the package downloads the dataset from Zenodo (~667MB), decodes 2.6M WKB polygons, and performs a spatial join against Natural Earth boundaries. This takes 2-3 minutes and is cached locally for instant subsequent loads.

Usage

Search

from groundsource import FloodDB
db = FloodDB()

# By country (supports common aliases: "USA", "UK", "UAE", etc.)
db.search(country="India")
db.search(country="USA", year_range=(2020, 2025))

# By city (98 major cities built-in, default 100km radius)
db.search(city="Houston", radius_km=50)

# By continent or bounding box
db.search(continent="Asia")
db.search(bbox=[0, 95, 25, 120])  # [min_lat, min_lon, max_lat, max_lon]

Trend Analysis

db.trend(country="India")                        # yearly event counts
db.growth(country="India")                       # growth rate between two periods
db.compare(["USA", "UK", "India", "Indonesia"])  # side-by-side comparison
db.top_countries(20)                             # ranked by total events
db.country_growth_ranking(20)                    # ranked by growth acceleration
db.bias_check()                                  # global yearly counts for bias analysis

Built-in Charts

db.plot_hockey_stick(save_path="hockey_stick.png")
db.plot_bias(save_path="bias.png")
db.plot_top_countries(save_path="top_countries.png")
db.plot_country_growth(save_path="growth.png")

Raw DataFrame Access

df = db.to_dataframe()
# Columns: uuid, area_km2, start_date, end_date, centroid_lon, centroid_lat,
#           country, iso_a3, continent, year

What This Package Does

The raw Parquet from Zenodo has 5 columns with no documentation:

Raw Column Type Issue
uuid string ID only
area_km2 float Usable as-is
geometry WKB binary Requires shapely to decode
start_date string Not parsed as datetime
end_date string Not parsed as datetime

This package enriches each event with:

Added Column Source
centroid_lon, centroid_lat Decoded from WKB polygons
country, iso_a3 Spatial join against Natural Earth
continent Natural Earth
year Extracted from start_date

Reporting Bias

The dataset shows 498 events in 2000 and 402,012 in 2024. This does not mean floods increased 807x. The data is extracted from news articles, and digital news coverage grew dramatically over this period. Any trend analysis should account for this reporting bias. Use db.bias_check() and db.plot_bias() to visualize this.

Bias Analysis

Top Countries by Events Detected

Top Countries

Dataset

  • Source: Google Groundsource
  • Download: Zenodo (CC BY 4.0)
  • Records: 2,646,302 events across 175 countries, 2000-2026
  • Method: Gemini parsed ~5M news articles
  • Accuracy: 60% location+timing, 82% practically useful (per Google)

License

MIT. The underlying dataset is licensed CC BY 4.0 by Google.

Citation

Google Research. Groundsource: Turning News Reports into Data with Gemini. Zenodo, 2026. DOI: 10.5281/zenodo.18647054

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

groundsource-0.1.1.tar.gz (223.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

groundsource-0.1.1-py3-none-any.whl (221.8 kB view details)

Uploaded Python 3

File details

Details for the file groundsource-0.1.1.tar.gz.

File metadata

  • Download URL: groundsource-0.1.1.tar.gz
  • Upload date:
  • Size: 223.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for groundsource-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e56d2fb96c1127ff60281a83939ba264db686f5e9bf4559be1ee6d2f3c11006b
MD5 ae5b182c1898cde6906fdc4d7718105a
BLAKE2b-256 ca6a86d06b24cf1d137d69b38a8c26b5497b034718ddfae010a1a916e937a35f

See more details on using hashes here.

File details

Details for the file groundsource-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: groundsource-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 221.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for groundsource-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9feecc67c108107a110718b4880ffef8a7fd2c09de105d354f79f00df6faed4c
MD5 0bce90a221c7439a637292354d17ab72
BLAKE2b-256 5958f2cfbce5bf09814df67761d502df4fcb746834312c513e9ce5f784030557

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page