The geospatial toolkit for redistricting data
Project description
maup
maup
is the geospatial toolkit for redistricting data. The package streamlines
the basic workflows that arise when working with blocks, precincts, and
districts, such as
- Assigning precincts to districts,
- Aggregating block data to precincts,
- Disaggregating data from precincts down to blocks, and
- Prorating data when units do not nest neatly.
The project's priorities are to be efficient by using spatial indices whenever possible and to integrate well with the existing ecosystem around pandas, geopandas and shapely. The package is distributed under the MIT License.
Installation
To install from PyPI, run pip install maup
from your terminal.
If you are using Anaconda, we recommend installing geopandas first by running
conda install -c conda-forge geopandas
and then running pip install maup
.
Examples
Here are some basic situations where you might find maup
helpful. For these
examples, let's assume that you have some shapefiles with data at varying
scales, and that you've used geopandas.read_file
to read those shapefiles into
three GeoDataFrames:
blocks
: Census blocks with demographic data.precincts
: Precinct geometries with election data but no demographic data.districts
: Legislative district geometries with no data attached.
Assigning precincts to districts
The assign
function in maup
takes two sets of geometries called sources
and targets
and returns a pandas Series
. The Series maps each geometry in
sources
to the geometry in targets
that covers it. (Here, geometry A
covers geometry B if every point of A and its boundary lies in B or its
boundary.) If a source geometry is not covered by one single target geometry, it
is assigned to the target geometry that covers the largest portion of its area.
from maup import assign
assignment = assign(precincts, districts)
# Add the assigned districts as a column of the `precincts` GeoDataFrame:
precincts["DISTRICT"] = assignment
As an aside, you can use that assignment
object to create a
gerrychain Partition
representing the division of the precincts into legislative districts:
from gerrychain import Graph, Partition
graph = Graph.from_geodataframe(precincts)
legislative_districts = Partition(graph, assignment)
Aggregating block data to precincts
If you want to aggregate columns called "TOTPOP"
, "NH_BLACK"
, and
"NH_WHITE"
from blocks
up to precincts
, you can run:
from maup import assign
variables = ["TOTPOP", "NH_BLACK", "NH_WHITE"]
assignment = assign(blocks, precincts)
precincts[variables] = blocks[variables].groupby(assignment).sum()
If you want to move data from one set of geometries to another but your source and target geometries do not nest neatly (i.e. have overlaps), see Prorating data when units do not nest neatly.
Disaggregating data from precincts down to blocks
It's common to have data at a coarser scale and want to try and disaggregate or
prorate it down to finer-scaled geometries. For example, let's say we want to
prorate some election data in columns "PRESD16"
, "PRESR16"
from our
precincts
GeoDataFrame down to our blocks
GeoDataFrame.
The first crucial step is to decide how we want to distribute a precinct's data
to the blocks within it. Since we're prorating election data, it makes sense to
use a block's total population or voting-age population. Here's how we might
prorate by population ("TOTPOP"
):
from maup import assign
election_columns = ["PRESD16", "PRESR16"]
assignment = assign(blocks, precincts)
# We prorate the vote totals according to each block's share of the overall
# precinct population:
weights = blocks.TOTPOP / assignment.map(precincts.TOTPOP)
prorated = assignment.map(precincts[election_columns]) * weights
# Add the prorated vote totals as columns on the `blocks` GeoDataFrame:
blocks[election_columns] = prorated
Warning about areal interpolation
We strongly urge you not to prorate by area! The area of a census block is not a good predictor of its population. In fact, the correlation goes in the other direction: larger census blocks are less populous than smaller ones.
Prorating data when units do not nest neatly
Suppose you have a shapefile of precincts with some election results data and you want to join that data onto a different, more recent precincts shapefile. The two sets of precincts will have overlaps, and will not nest neatly like the blocks and precincts did in the above examples. (Not that blocks and precincts always nest neatly...)
We can use intersections
to break the two sets of precincts into pieces that
nest neatly into both sets. Then we can disaggregate from the old precincts onto
these pieces, and aggregate up from the pieces to the new precincts. This move
is a bit complicated, so maup
has a function called prorate
that does just
that.
We'll use our same blocks
GeoDataFrame to estimate the populations of the
pieces for the purposes of proration.
from maup import intersections, prorate
columns = ["SEND12", "SENR12"]
pieces = intersections(old_precincts, new_precincts)
# Weight by prorated population from blocks
weights = blocks["TOTPOP"].groupby(assign(blocks, pieces)).sum()
# Use blocks to estimate population of each piece
new_precincts[columns] = prorate(
pieces,
old_precincts[columns],
weight_by=weights
)
Modifiable areal unit problem
The name of this package comes from the
modifiable areal unit problem (MAUP):
the same spatial data will look different depending on how you divide up the
space. Since maup
is all about changing the way your data is aggregated and
partitioned, we have named it after the MAUP to encourage that the toolkit be
used thoughtfully and responsibly.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.