Skip to main content

A package for simplifying intersectional analysis

Project description

Intersectionalipy

Background

As debates about race, privilege, and cultural power continue to rage through our society, intersectionality has become simultaneously one of the most important and least well understood paradigms of thought in the modern political environment. The term itself, originally coined by Kimberlé Williams Crenshaw, refers to ways in our overlapping identities influence our experiences of privilege and oppression.

At its core, intersectionality is a framework for describing how people's different identities interact and amplify each other in ways that not are simply described by single categories like race or gender. For instance, a Black woman might experience the condition of her race¹ differently than a Black man would. The theory aims to describe the different "vectors of oppression" that a person might experience as a result of their identities. At a community level, this results in the idea of a "matrix of domination" that describes the relationships between different strata of society.

In order to formalize this theory mathematically, we must clear up some potentially confusing terminology. We can consider the notion of oppression to be a function over a set of identity values drawn from a set of identity spaces . While the output space of the oppression fuction may itself be mulitdimensional–economic, social, political, etc.–the term "vectors of oppression" is a bit of a misnomer as oppression does not properly constitue a vector space (for instance, addition of oppressions is not guaranteed to be commutative). Indeed, much of the work of intersectionality involves considering maps between different identities (not to be confused with the standard idenity map concept in mathematics, e.g. the identity matrix in the general linear group ) to understand the oppression a person might experience if their identity were to be different (i.e., ). In this formalism, we also realize that the "matrix of domination" is not properly a matrix, as it does not describe a linear transformation between identity spaces (it is more properly a bivector of domination).

While it might seem a trivial observation that people from different backgrounds experience the world differently, the theory of intersectionality makes straightforward but strong mathematical claims that have significant implications in the realm of statistical analysis. Namely, intersectionality asserts that oppression cannot generally be expressed as a linear combination of functions over the individual subspaces (that is, functions of the form ). Rather, oppression must be considered as a general, possibly non-linear, function over the entire identity space, .

This leads to natural, practical consequences in the practices of statistics and econometrics. Most directly, it is frequent practice to run regressions which attempt to control for the effects of different identities through the use of indicator/dummy variables (or one-hot encodings in the machine learning parlance). Typically these indicator variables are produced at the level of a single identity space . Such a formalism precludes the ability to capture the kind of non-linear interactions between identities that are the focus of intersectional analysis. Instead, intersectionality prescribes that these indicator variables should be constructed on the Cartesian product of the identity spaces.

An aside: this distinction is especially crucial for regression and other linear approaches that are dominant in the social sciences. However, in the discipline of machine learning (ML) more flexible models are common (like random forests and neural networks) which have the ability, in principle, to express arbitrary non-linear relationships. Thus, a valid ML approach is to use a non-linear model with the standard indicator identity variables. (We should be careful to distinguish this ML approach from another "ML" approach–that of Marxism-Leninism, which would impose that oppression be a univariate function of class², itself possibly a function of identity.)

While the construction of indicator variables from the Cartesian product space is a powerful prescription, it doesn't come for free: namely, this construction invites the curse of dimensionality. If we incorporate even a relatively small number of identity spaces like race, gender, and sexual orientation, we could easily end up with 100 indicator variables. Unless we are dealing with a large number of observations in our dataset, this in turn can lead to overfitting, or in the extreme case to more fitting parameters than observations and thus no unique solution. Therefore, intersectional analysis must carefully balance the inclusion of these interaction terms with the concern of statistically significant, generalizable results.

Intersectionalipy is a package for Python that aims to make this kind of intersectional analysis easier for researchers.

¹Not to be confused with a race condition, which is a rarely a concern in synchronous Python code due to the global interpreter lock.

²Function of class here meaning class in the political sense; intersectionalipy does not register any methods to existing Python classes.

Usage

Intersectionalipy provides a clear, straightforward API that aims to simplify performing intersectional analysis in the scientific python stack. The package is built to work with pandas DataFrames, which can plugged directly into statistical analysis packages like statsmodels or scikit-learn.

Intersectionalipy operates on DataFrames where some of the columns encode categorical identity information. Users pass in the dataframe and the names of those identity columns, and intersectionalipy returns a new dataframe where all identity columns have been replaced by a complete set of identity indicator variables.

For example:

>>> import intersectionalipy as i14py, pandas as pd
>>> df = pd.DataFrame({
...     'data': [0.2, 0.7, 0.9],
...     'gender': ['female', 'male', 'female' ],
...     'race': ['white', 'Black', 'Asian'],
...     'sexuality': ['gay', 'straight', 'queer'],
... })
>>> df
   data  gender   race sexuality
0   0.2  female  white       gay
1   0.7    male  Black  straight
2   0.9  female  Asian     queer
>>> i14py.intersectionalize(df, ['race', 'sexuality', 'gender'])
   data  (Asian, queer, female)  (Black, straight, male)  (white, gay, female)
0   0.2                       0                        0                     1
1   0.7                       0                        1                     0
2   0.9                       1                        0                     0

If using all identity columns together results in too many indicator variables for a meaningful regression, researchers can choose to limit the set of identity interations they explore through multiple calls to intersectionalipy:

df = i14py.intersectionalize(df, ['race', 'religion'])
df = i14py.intersectionalize(df, ['sexuality', 'gender'])

Practicioners may want to conduct a power analysis (as in statistical power, though the regression itself may be an analysis of cultural and political power) to decide on the right level of intersectionality.

NB: When running a regression on categorical values with indicator variables, it is typical practice to use indicator regressors to prevent overfitting. This is typically done by choosing one category to leave out, and then the coefficients on all other indicator variables are interpreted with respect to that left out category. Because the sensible choice of a base category often depends on the detailed nature of the problem, intersectionalipy leaves this choice to the researcher and returns all indicators.

Installation

pip install intersectionalipy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intersectionalipy-0.1.1.tar.gz (6.9 kB view hashes)

Uploaded Source

Built Distribution

intersectionalipy-0.1.1-py3-none-any.whl (6.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page