Skip to main content

Geographic and census data for Guatemala — the tigris equivalent for Guatemalan researchers

Project description

🦜 GeoQuetzal

🌐 English | Leer en español

Geographic and census data for Guatemala — the first library of its kind for Central America.

GeoQuetzal gives Guatemalan researchers programmatic access to administrative boundaries and census microdata, following the same philosophy as tigris/tidycensus for the US and geobr for Brazil.

import geoquetzal as gq

deptos = gq.departamentos()
deptos.plot(edgecolor="white", figsize=(8, 8))

Why GeoQuetzal?

Working with Guatemalan geographic and census data typically means downloading shapefiles from GADM, cleaning up inconsistent name spellings, downloading census CSVs from INE, figuring out how to join them — and dealing with the fact that GADM spells "Quetzaltenango" as "Quezaltenango" and concatenates "San Marcos" into "SanMarcos".

GeoQuetzal handles all of that. One function call gives you clean, analysis-ready data with consistent INE names and numeric codes that join reliably.

Installation

pip install geoquetzal

# With plotting support (matplotlib + folium)
pip install geoquetzal[plotting]

# Everything (adds contextily for basemaps)
pip install geoquetzal[all]

Requirements: Python 3.9+, geopandas, pandas, requests, pyarrow.

Datasets

Dataset Records Variables Download size Source
Boundaries 22 deptos / 340 municipios geometry + codes ~2 MB GADM v4.1
Emigración 242,203 11 ~1.6 MB INE Censo 2018
Hogares 3,275,931 37 ~38 MB INE Censo 2018
Vivienda ~3,300,000 11 ~30 MB INE Censo 2018
Personas 14,901,286 84 ~333 MB INE Censo 2018

All census datasets are hosted as Parquet files on GitHub Releases and downloaded on-demand per departamento. After the first download, data loads instantly from a local cache.

Quick Start

Administrative Boundaries

import geoquetzal as gq

# Country outline
gq.country()

# All 22 departamentos
deptos = gq.departamentos()

# By name or code (accent-insensitive)
gq.departamentos("Sacatepequez")     # accent-insensitive ✓
gq.departamentos("Sacatepéquez")     # exact spelling ✓
gq.departamentos(3)                  # INE code ✓

# By region
gq.departamentos(region="V - Central")

# Municipios (~340)
gq.municipios("Sacatepequez")                # all municipios in a departamento
gq.municipios(name="Antigua Guatemala")      # single municipio by name
gq.municipios(name=301)                      # single municipio by code

# Guatemala City zone-level polygons
gq.municipios("Guatemala", zonas=True)       # 22 rows, one per zona

Census Microdata

All four census datasets follow the same pattern — filter by departamento or municipio, optionally attach geometry:

from geoquetzal.emigracion import emigracion
from geoquetzal.hogares import hogares
from geoquetzal.vivienda import vivienda
from geoquetzal.personas import personas

# Load all records
df = emigracion()                                   # 242K emigrant records
df = hogares()                                      # 3.2M households
df = vivienda()                                     # 3.3M housing units
df = personas()                                     # 14.9M people

# Filter by departamento (only downloads that departamento's file)
df = hogares(departamento="Huehuetenango")
df = hogares(departamento=13)

# Filter by municipio
df = hogares(municipio="Antigua Guatemala")
df = hogares(municipio=301)

# Attach geometry for mapping
gdf = hogares(departamento="Petén", geometry="municipio")

Explore Variables

Every dataset module includes a describe() function:

from geoquetzal.hogares import describe

describe()            # summary table of all 37 variables
describe("PCH4")      # water source — values and labels
describe("PCH15")     # receives remittances

Variable Highlights

Emigración: sex (PEI3), age at departure (PEI4), year left (PEI5)

Hogares: water source (PCH4), sanitation (PCH5), electricity (PCH8), appliances — radio, TV, fridge, internet, car (PCH9_APCH9_M), cooking fuel (PCH14), remittances (PCH15)

Vivienda: housing type (PCV1), wall material (PCV2), roof (PCV3), floor (PCV5)

Personas: sex (PCP6), age (PCP7), ethnicity (PCP12 — Maya/Garífuna/Xinka/Ladino), Mayan linguistic community (PCP13), mother tongue (PCP15), disability (PCP16_APCP16_F), education (PCP17_A), literacy (PCP22), tech access — cellphone/computer/internet (PCP26_APCP26_C), employment (PCP27), marital status (PCP34), fertility (PCP35PCP39)

Mapping Patterns

Static Choropleth (matplotlib)

import geoquetzal as gq
from geoquetzal.hogares import hogares

df = hogares(departamento="Sacatepequez")
pct_internet = (
    df.groupby("MUNICIPIO")["PCH9_I"]
    .apply(lambda x: (x == 1).mean() * 100)
    .round(1)
    .reset_index(name="pct")
)

munis = gq.municipios("Sacatepequez")
result = munis.merge(pct_internet, left_on="codigo_muni", right_on="MUNICIPIO")
result.plot(column="pct", cmap="YlGnBu", legend=True, edgecolor="white")

Interactive Map (folium)

result.explore(
    column="pct",
    tooltip=["municipio", "pct"],
    tiles="CartoDB positron",
)

Animated Choropleth (Plotly)

import plotly.express as px
import json

deptos = gq.departamentos()
geojson = json.loads(deptos.to_json())
for f in geojson["features"]:
    f["id"] = f["properties"]["codigo_depto"]

fig = px.choropleth(
    agg_df,                         # your aggregated data
    geojson=geojson,
    locations="codigo_depto",
    color="value",
    animation_frame="year",
)
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

Key rule: Always aggregate first with pandas, then merge geometry onto the 22 or 340 summary rows. Never use geometry= on large microdata — it attaches a polygon to every row and is very slow.

Coordinate Reference Systems

from geoquetzal.crs import to_gtm, to_utm16n, suggest_crs

deptos = gq.departamentos()
suggest_crs(deptos)              # prints recommendations

deptos_gtm = to_gtm(deptos)     # Guatemala Transverse Mercator (national standard)
deptos_utm = to_utm16n(deptos)   # UTM Zone 16N (good for area/distance)
CRS EPSG Use case
WGS 84 4326 Default from GADM, web maps
Guatemala TM (GTM) ESRI:103598 National standard, official maps
UTM Zone 16N 32616 Area and distance calculations

How Data Works

Boundaries are downloaded from GADM v4.1 on first call and cached locally. GeoQuetzal automatically resolves the many spelling differences between GADM and INE names using numeric codes.

Census microdata is partitioned by departamento into Parquet files and hosted on GitHub Releases. When you request a single departamento, only that file is downloaded (~1–15 MB). Requesting all of Guatemala downloads all 22 files. Everything is cached after the first download.

Joins between census data and boundaries always use INE numeric codes (codigo_depto, codigo_muni), never names — because GADM and INE spell names differently.

GADM Matching & Diagnostics

GADM v4.1 has 354 polygons for Guatemala (vs INE's 340 municipios). The extras include lake polygons (Lago de Atitlán, Lago de Amatitlán) and Guatemala City split into 22 zone polygons. GeoQuetzal handles all of this automatically.

To check the current matching status:

from geoquetzal.geography import diagnose_matching
results = diagnose_matching()

A few newer municipios (Sipacate, Raxruhá, Petatán, etc.) don't exist in GADM v4.1 and won't have boundary polygons until GADM updates.

Data Sources & Attribution

Contributing

GeoQuetzal is open source under the MIT license. Contributions are welcome — especially around new datasets, documentation, and example notebooks.

git clone https://github.com/geoquetzal/geoquetzal.git
cd geoquetzal
pip install -e ".[dev,plotting]"

Author

Created by Jorge Yass — online lecturer at Universidad del Valle de Guatemala (UVG) and PhD student at Iowa State University.

Inspired by mentoring a Data Science for Public Good team and the realization that Guatemala (and Central America) had no equivalent to tigris, tidycensus, or geobr.

License

MIT. Census data is public information from INE Guatemala. GADM boundary data is subject to GADM's license (free for academic/non-commercial use).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoquetzal-1.0.0.tar.gz (327.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoquetzal-1.0.0-py3-none-any.whl (338.0 kB view details)

Uploaded Python 3

File details

Details for the file geoquetzal-1.0.0.tar.gz.

File metadata

  • Download URL: geoquetzal-1.0.0.tar.gz
  • Upload date:
  • Size: 327.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for geoquetzal-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0aa9acf284dba712027dc6891a55fec9c36ab15f4a6f7ad926eaed866e481a0c
MD5 424670385b5555ff73487cc18ee692e5
BLAKE2b-256 916afb0a1401a4e1d232427e38a2a5cbfcd9450e6d01656135912b747de92453

See more details on using hashes here.

File details

Details for the file geoquetzal-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: geoquetzal-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 338.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for geoquetzal-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a953717bdfd5eaae45011e2af40fa622b00fb3eb503ac05f5a73758ff004e0b
MD5 7fa8c2f3999465e79fb123ceee7afb78
BLAKE2b-256 2a2ca7d8299aa4038253d5bc71826b58abf48455e2479831a25b4c6aece3a0f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page