Skip to main content

Python client library for GDELT (Global Database of Events, Language, and Tone)

Project description

gdelt-py

CI PyPI version Python Versions License Code style: ruff Type checked: mypy

A comprehensive Python client library for the GDELT (Global Database of Events, Language, and Tone) project.

Features

  • Unified Interface: Single client covering all 6 REST APIs, 3 database tables, and NGrams dataset
  • Version Normalization: Transparent handling of GDELT v1/v2 differences with normalized output
  • Resilience: Automatic fallback to BigQuery when APIs fail or rate limit
  • Modern Python: 3.11+, Async-first, Pydantic models, type hints throughout
  • Streaming: Generator-based iteration for large datasets with memory efficiency
  • Developer Experience: Clear errors, progress indicators, comprehensive lookups

Installation

# Basic installation
pip install gdelt-py

# With BigQuery support
pip install gdelt-py[bigquery]

# With all optional dependencies
pip install gdelt-py[bigquery,pandas]

Quick Start

from py_gdelt import GDELTClient
from py_gdelt.filters import DateRange, EventFilter
from datetime import date, timedelta

async with GDELTClient() as client:
    # Query recent events
    yesterday = date.today() - timedelta(days=1)
    event_filter = EventFilter(
        date_range=DateRange(start=yesterday, end=yesterday),
        actor1_country="USA",
    )

    result = await client.events.query(event_filter)
    print(f"Found {len(result)} events")

    # Query Visual GKG (image analysis)
    from py_gdelt.filters import VGKGFilter
    vgkg_filter = VGKGFilter(
        date_range=DateRange(start=yesterday),
        domain="cnn.com",
    )
    images = await client.vgkg.query(vgkg_filter)

    # Query TV NGrams (word frequencies from TV)
    from py_gdelt.filters import BroadcastNGramsFilter
    tv_filter = BroadcastNGramsFilter(
        date_range=DateRange(start=yesterday),
        station="CNN",
        ngram_size=1,
    )
    ngrams = await client.tv_ngrams.query(tv_filter)

    # Query Graph Datasets (quotes, entities, frontpage links)
    from py_gdelt.filters import GQGFilter, GEGFilter
    gqg_filter = GQGFilter(date_range=DateRange(start=yesterday))
    quotes = await client.graphs.query_gqg(gqg_filter)

    geg_filter = GEGFilter(date_range=DateRange(start=yesterday))
    async for entity in client.graphs.stream_geg(geg_filter):
        print(f"{entity.name}: {entity.entity_type}")

Data Sources Covered

File-Based Endpoints

  • Events - Structured event data (who did what to whom, when, where)
  • Mentions - Article mentions of events over time
  • GKG - Global Knowledge Graph (themes, entities, tone, quotations)
  • NGrams - Word and phrase occurrences in articles (Jan 2020+)
  • VGKG - Visual GKG (image annotations via Cloud Vision API)
  • TV-GKG - Television GKG (closed caption analysis from TV broadcasts)
  • TV NGrams - Word frequencies from TV closed captions
  • Radio NGrams - Word frequencies from radio transcripts
  • Graph Datasets - GQG, GEG, GFG, GGG, GEMG, GAL (see below)

REST APIs

  • DOC 2.0 - Full-text article search and discovery
  • GEO 2.0 - Geographic analysis and mapping
  • Context 2.0 - Sentence-level contextual search
  • TV 2.0 - Television news closed caption search
  • TV AI 2.0 - AI-enhanced visual TV search (labels, OCR, faces)
  • LowerThird 🏗️ - TV chyron/lower-third text search
  • TVV 🏗️ - TV Visual channel inventory
  • GKG GeoJSON v1 🏗️ - Legacy geographic GKG API

Graph Datasets

  • GQG - Global Quotation Graph (extracted quotes with context)
  • GEG - Global Entity Graph (NER via Cloud NLP API)
  • GFG - Global Frontpage Graph (homepage link tracking)
  • GGG - Global Geographic Graph (location co-mentions)
  • GDG 🏗️ - Global Difference Graph (article change detection)
  • GEMG - Global Embedded Metadata Graph (meta tags, JSON-LD)
  • GRG 🏗️ - Global Relationship Graph (subject-verb-object triples)
  • GAL - Article List (lightweight article metadata)

Lookup Tables

  • CAMEO - Event classification codes and Goldstein scale
  • Themes - GKG theme taxonomy
  • Countries - Country code conversions (FIPS ↔ ISO)
  • Ethnic/Religious Groups - Group classification codes
  • GCAM 🏗️ - 2,300+ emotional/thematic dimensions
  • Image Tags 🏗️ - Cloud Vision labels for DOC API
  • Languages 🏗️ - Supported language codes

Data Source Matrix

Data Type API BigQuery Raw Files Time Range Fallback
Articles (fulltext) DOC 2.0 - - Rolling 3 months -
Article geography GEO 2.0 - - Rolling 7 days -
Sentence context Context 2.0 - - Rolling 72 hours -
TV captions TV 2.0 - - Jul 2009+ -
TV visual/AI TV AI 2.0 - - Jul 2010+ -
TV chyrons 🏗️ LowerThird - - Aug 2017+ -
Events v2 - Feb 2015+
Events v1 - 1979 - Feb 2015
Mentions - Feb 2015+
GKG v2 - Feb 2015+
GKG v1 - Apr 2013 - Feb 2015
Web NGrams - Jan 2020+
VGKG - Dec 2015+
TV-GKG - Jul 2009+
TV NGrams - - Jul 2009+ -
Radio NGrams - - 2017+ -
GQG - - Jan 2020+ -
GEG - - Jul 2016+ -
GFG - - Mar 2018+ -
GGG - - Jan 2020+ -
GEMG - - Jan 2020+ -
GAL - - Jan 2020+ -

🏗️ = Work in progress - coming in future releases

Key Concepts

Async-First Design

All I/O operations are async by default for optimal performance:

async with GDELTClient() as client:
    articles = await client.doc.query(doc_filter)

Synchronous wrappers are available for compatibility:

with GDELTClient() as client:
    articles = client.doc.query_sync(doc_filter)

Streaming for Efficiency

Process large datasets without loading everything into memory:

async with GDELTClient() as client:
    async for event in client.events.stream(event_filter):
        process(event)  # Memory-efficient

Type Safety

Pydantic models throughout with full type hints:

event: Event = result[0]
assert event.goldstein_scale  # Type-checked

Configuration

Flexible configuration via environment variables, TOML files, or programmatic settings:

settings = GDELTSettings(
    timeout=60,
    max_retries=5,
    cache_dir=Path("/custom/cache"),
)

async with GDELTClient(settings=settings) as client:
    ...

Documentation

Full documentation available at: https://rbozydar.github.io/py-gdelt/

Contributing

Contributions are welcome! See Contributing Guide for details.

License

MIT License - see LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gdelt_py-0.1.6.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gdelt_py-0.1.6-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file gdelt_py-0.1.6.tar.gz.

File metadata

  • Download URL: gdelt_py-0.1.6.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gdelt_py-0.1.6.tar.gz
Algorithm Hash digest
SHA256 6b23cdf18542063e6b6482645568a731dd53f88709afcf157e14eee55a1b4112
MD5 6c4a01d87ee705d0c204429a1645e0aa
BLAKE2b-256 5b9670eb25394e239964dd0568aac2ee852633d179412827c41584735ed906a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for gdelt_py-0.1.6.tar.gz:

Publisher: publish.yml on RBozydar/py-gdelt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gdelt_py-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: gdelt_py-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gdelt_py-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a838b1245c1c9ad7e91f9b06db9d6887c17bc6190c23502429c5cedd2b72cd80
MD5 7bed238ac854dae59056829fd0bd7cbd
BLAKE2b-256 ce934b97c3f64bded6d6d4977b4f296f45a03c3d37a218b3d03c8822ba6cf064

See more details on using hashes here.

Provenance

The following attestation bundles were made for gdelt_py-0.1.6-py3-none-any.whl:

Publisher: publish.yml on RBozydar/py-gdelt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page