Python client library for GDELT (Global Database of Events, Language, and Tone)
Project description
gdelt-py
A comprehensive Python client library for the GDELT (Global Database of Events, Language, and Tone) project.
Features
- Unified Interface: Single client covering all 6 REST APIs, 3 database tables, and NGrams dataset
- Version Normalization: Transparent handling of GDELT v1/v2 differences with normalized output
- Resilience: Automatic fallback to BigQuery when APIs fail or rate limit
- Modern Python: 3.11+, Async-first, Pydantic models, type hints throughout
- Streaming: Generator-based iteration for large datasets with memory efficiency
- Developer Experience: Clear errors, progress indicators, comprehensive lookups
Installation
# Basic installation
pip install gdelt-py
# With BigQuery support
pip install gdelt-py[bigquery]
# With all optional dependencies
pip install gdelt-py[bigquery,pandas]
Quick Start
from py_gdelt import GDELTClient
from py_gdelt.filters import DateRange, EventFilter
from datetime import date, timedelta
async with GDELTClient() as client:
# Query recent events
yesterday = date.today() - timedelta(days=1)
event_filter = EventFilter(
date_range=DateRange(start=yesterday, end=yesterday),
actor1_country="USA",
)
result = await client.events.query(event_filter)
print(f"Found {len(result)} events")
# Query Visual GKG (image analysis)
from py_gdelt.filters import VGKGFilter
vgkg_filter = VGKGFilter(
date_range=DateRange(start=yesterday),
domain="cnn.com",
)
images = await client.vgkg.query(vgkg_filter)
# Query TV NGrams (word frequencies from TV)
from py_gdelt.filters import BroadcastNGramsFilter
tv_filter = BroadcastNGramsFilter(
date_range=DateRange(start=yesterday),
station="CNN",
ngram_size=1,
)
ngrams = await client.tv_ngrams.query(tv_filter)
# Query Graph Datasets (quotes, entities, frontpage links)
from py_gdelt.filters import GQGFilter, GEGFilter
gqg_filter = GQGFilter(date_range=DateRange(start=yesterday))
quotes = await client.graphs.query_gqg(gqg_filter)
geg_filter = GEGFilter(date_range=DateRange(start=yesterday))
async for entity in client.graphs.stream_geg(geg_filter):
print(f"{entity.name}: {entity.entity_type}")
Data Sources Covered
File-Based Endpoints
- Events - Structured event data (who did what to whom, when, where)
- Mentions - Article mentions of events over time
- GKG - Global Knowledge Graph (themes, entities, tone, quotations)
- NGrams - Word and phrase occurrences in articles (Jan 2020+)
- VGKG - Visual GKG (image annotations via Cloud Vision API)
- TV-GKG - Television GKG (closed caption analysis from TV broadcasts)
- TV NGrams - Word frequencies from TV closed captions
- Radio NGrams - Word frequencies from radio transcripts
- Graph Datasets - GQG, GEG, GFG, GGG, GEMG, GAL (see below)
REST APIs
- DOC 2.0 - Full-text article search and discovery
- GEO 2.0 - Geographic analysis and mapping
- Context 2.0 - Sentence-level contextual search
- TV 2.0 - Television news closed caption search
- TV AI 2.0 - AI-enhanced visual TV search (labels, OCR, faces)
- LowerThird 🏗️ - TV chyron/lower-third text search
- TVV 🏗️ - TV Visual channel inventory
- GKG GeoJSON v1 🏗️ - Legacy geographic GKG API
Graph Datasets
- GQG - Global Quotation Graph (extracted quotes with context)
- GEG - Global Entity Graph (NER via Cloud NLP API)
- GFG - Global Frontpage Graph (homepage link tracking)
- GGG - Global Geographic Graph (location co-mentions)
- GDG 🏗️ - Global Difference Graph (article change detection)
- GEMG - Global Embedded Metadata Graph (meta tags, JSON-LD)
- GRG 🏗️ - Global Relationship Graph (subject-verb-object triples)
- GAL - Article List (lightweight article metadata)
Lookup Tables
- CAMEO - Event classification codes and Goldstein scale
- Themes - GKG theme taxonomy
- Countries - Country code conversions (FIPS ↔ ISO)
- Ethnic/Religious Groups - Group classification codes
- GCAM 🏗️ - 2,300+ emotional/thematic dimensions
- Image Tags 🏗️ - Cloud Vision labels for DOC API
- Languages 🏗️ - Supported language codes
Data Source Matrix
| Data Type | API | BigQuery | Raw Files | Time Range | Fallback |
|---|---|---|---|---|---|
| Articles (fulltext) | DOC 2.0 | - | - | Rolling 3 months | - |
| Article geography | GEO 2.0 | - | - | Rolling 7 days | - |
| Sentence context | Context 2.0 | - | - | Rolling 72 hours | - |
| TV captions | TV 2.0 | - | - | Jul 2009+ | - |
| TV visual/AI | TV AI 2.0 | - | - | Jul 2010+ | - |
| TV chyrons 🏗️ | LowerThird | - | - | Aug 2017+ | - |
| Events v2 | - | ✓ | ✓ | Feb 2015+ | ✓ |
| Events v1 | - | ✓ | ✓ | 1979 - Feb 2015 | ✓ |
| Mentions | - | ✓ | ✓ | Feb 2015+ | ✓ |
| GKG v2 | - | ✓ | ✓ | Feb 2015+ | ✓ |
| GKG v1 | - | ✓ | ✓ | Apr 2013 - Feb 2015 | ✓ |
| Web NGrams | - | ✓ | ✓ | Jan 2020+ | ✓ |
| VGKG | - | ✓ | ✓ | Dec 2015+ | ✓ |
| TV-GKG | - | ✓ | ✓ | Jul 2009+ | ✓ |
| TV NGrams | - | - | ✓ | Jul 2009+ | - |
| Radio NGrams | - | - | ✓ | 2017+ | - |
| GQG | - | - | ✓ | Jan 2020+ | - |
| GEG | - | - | ✓ | Jul 2016+ | - |
| GFG | - | - | ✓ | Mar 2018+ | - |
| GGG | - | - | ✓ | Jan 2020+ | - |
| GEMG | - | - | ✓ | Jan 2020+ | - |
| GAL | - | - | ✓ | Jan 2020+ | - |
🏗️ = Work in progress - coming in future releases
Key Concepts
Async-First Design
All I/O operations are async by default for optimal performance:
async with GDELTClient() as client:
articles = await client.doc.query(doc_filter)
Synchronous wrappers are available for compatibility:
with GDELTClient() as client:
articles = client.doc.query_sync(doc_filter)
Streaming for Efficiency
Process large datasets without loading everything into memory:
async with GDELTClient() as client:
async for event in client.events.stream(event_filter):
process(event) # Memory-efficient
Type Safety
Pydantic models throughout with full type hints:
event: Event = result[0]
assert event.goldstein_scale # Type-checked
Configuration
Flexible configuration via environment variables, TOML files, or programmatic settings:
settings = GDELTSettings(
timeout=60,
max_retries=5,
cache_dir=Path("/custom/cache"),
)
async with GDELTClient(settings=settings) as client:
...
Documentation
Full documentation available at: https://rbozydar.github.io/py-gdelt/
Contributing
Contributions are welcome! See Contributing Guide for details.
License
MIT License - see LICENSE file for details.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gdelt_py-0.1.6.tar.gz.
File metadata
- Download URL: gdelt_py-0.1.6.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b23cdf18542063e6b6482645568a731dd53f88709afcf157e14eee55a1b4112
|
|
| MD5 |
6c4a01d87ee705d0c204429a1645e0aa
|
|
| BLAKE2b-256 |
5b9670eb25394e239964dd0568aac2ee852633d179412827c41584735ed906a6
|
Provenance
The following attestation bundles were made for gdelt_py-0.1.6.tar.gz:
Publisher:
publish.yml on RBozydar/py-gdelt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gdelt_py-0.1.6.tar.gz -
Subject digest:
6b23cdf18542063e6b6482645568a731dd53f88709afcf157e14eee55a1b4112 - Sigstore transparency entry: 855024595
- Sigstore integration time:
-
Permalink:
RBozydar/py-gdelt@6367fd38dc8aca95fdefb07ca5860a38317004cc -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/RBozydar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6367fd38dc8aca95fdefb07ca5860a38317004cc -
Trigger Event:
release
-
Statement type:
File details
Details for the file gdelt_py-0.1.6-py3-none-any.whl.
File metadata
- Download URL: gdelt_py-0.1.6-py3-none-any.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a838b1245c1c9ad7e91f9b06db9d6887c17bc6190c23502429c5cedd2b72cd80
|
|
| MD5 |
7bed238ac854dae59056829fd0bd7cbd
|
|
| BLAKE2b-256 |
ce934b97c3f64bded6d6d4977b4f296f45a03c3d37a218b3d03c8822ba6cf064
|
Provenance
The following attestation bundles were made for gdelt_py-0.1.6-py3-none-any.whl:
Publisher:
publish.yml on RBozydar/py-gdelt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gdelt_py-0.1.6-py3-none-any.whl -
Subject digest:
a838b1245c1c9ad7e91f9b06db9d6887c17bc6190c23502429c5cedd2b72cd80 - Sigstore transparency entry: 855024599
- Sigstore integration time:
-
Permalink:
RBozydar/py-gdelt@6367fd38dc8aca95fdefb07ca5860a38317004cc -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/RBozydar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6367fd38dc8aca95fdefb07ca5860a38317004cc -
Trigger Event:
release
-
Statement type: