Skip to main content

A Python package for accessing police traffic stop, arrests, use of force, etc. data

Project description

OpenPoliceData

OpenPoliceData is a Python package for police data analysis that provides easy access to incident-level data from police departments around the United States for traffic stops, pedestrian stops, use of force, and other types of police interactions.

Installation

The source code is available at https://github.com/openpolicedata/openpolicedata.

OpenPoliceData can be installed from the Python Package Index (PyPI):

pip install openpolicedata

Additionally, geopandas can be installed to enable downloaded data tables to be returned as geopandas DataFrames instead of pandas DataFrames when there is geographic data. It is recommended to use conda to install geopandas.

Examples

Jupyter notebooks demonstrating example usage of OpenPoliceData can be found in the notebooks folder.

Contributing

If you're interesting in helping out, see our Contributing Guide

Documentation

datasets_query(source_name=None, state=None, agency=None, table_type=None)

Query the available datasets to see what is available. Various filters can be applied. By default, all datasets are returned.

> import openpolicedata as opd
> datasets = opd.datasets_query(state="California")
> datasets.head()
State SourceName Agency TableType Year
California Anaheim Anaheim TRAFFIC STOPS MULTI
California Bakersfield Bakersfield TRAFFIC STOPS MULTI
California California MULTI STOPS 2018
California California MULTI STOPS 2019
California California MULTI STOPS 2020

(only 1st 5 columns shown above)

datasets is a pandas DataFrame. The first 5 datasets available from California include traffic stops data from multiples years from Anaheim and Bakersfield and data from every agency in California for all types of police stops for years 2018, 2019, and 2020.

Source(source_name, state=None)

Create a data source. A data source allows the user to easily import or export police data. It provides access to all datasets available from a source. source_name should match a value of SourceName for an available dataset. An optional state parameter is used to resolve ambiguities when the same source name is used in multiple states (such as multiple states have State Police).

> src = opd.Source(source_name="Virginia")
> src.datasets
State SourceName Agency TableType Year
Virginia Virginia MULTI STOPS MULTI

(only 1st 5 columns shown above)

There is 1 dataset available from the state of Virginia that contains data from every agency in Virginia for all types of police stops for multiple years.

get_tables_types()

Show all types of data available from a source.

> src.get_tables_types()
['STOPS']

get_years(table_type=None)

Show years available for one or more datasets. Results can be filtered to only show years for a specific type of data.

> src.get_years(table_type="STOPS")
[2020, 2021, 2022]

get_agencies(table_type=None, year=None, partial_name=None)

Show agencies (police departments) that have data available. This is typically a single agency unless the data is from a state. Results can be filtered to only show agencies for a specific type of data and/or year. partial_name can be used to find only agencies containing a substring. This is useful for finding the exact name of a police department.

> agencies = src.get_agencies(partial_name="Arlington")
> print(agencies)
['Arlington County Police Department', "Arlington County Sheriff's Office"]

load_from_url(year, table_type=None, agency=None, pbar=True)

Import data from the source. Data for a year (i.e. 2020) or a range of years (i.e. [2020, 2022]) can be requested. If more than one data type is available, table_type must be specified. Optionally, for datasets containing multiple agencies (police departments) data, agency can be used to request data for a single agency. pbar can be set to false to not show a progress bar while loading.

> agency = "Arlington County Police Department"
> tbl = src.load_from_url(year=2021, table_type="STOPS", agency=agency)
> tbl.table.head(n=3)
incident_date agency_name agency reason_for_stop race ethnicity
2021-01-01 Arlington County Police Department ARLINGTON CO OTHER WHITE HISPANIC
2021-01-01 Arlington County Police Department ARLINGTON CO EQUIPMENT VIOLATION WHITE NON-HISPANIC
2021-01-01 Arlington County Police Department ARLINGTON CO TRAFFIC VIOLATION BLACK OR AFRICAN AMERICAN NON-HISPANIC

(only 1st 6 columns shown above)

The result of load_from_url is a Table object. The table contained in the Table object is either a geopandas or pandas DataFrame depending on whether the returned data contains geographic data or not.

to_csv(output_dir=None, filename=None)

Export table to CSV. The default output directory is the current directory. The default filename is automatically generated which enables the user to easily re-import the table to a new Table object.

> tbl.to_csv()

load_from_csv(year, output_dir=None, table_type=None, agency=None)

Import table from previously exported CSV. The directory to look in defaults to the current directory. The CSV file must have been automatically generated (see to_csv). year, table_type, and agency are defined the same as for load_from_url.

> new_src = opd.Source(source_name="Virginia")
new_t = new_src.load_from_csv(year=2021, agency=agency)


> tbl.table.head(n=3)
incident_date agency_name agency reason_for_stop race ethnicity
2021-01-01 Arlington County Police Department ARLINGTON CO OTHER WHITE HISPANIC
2021-01-01 Arlington County Police Department ARLINGTON CO EQUIPMENT VIOLATION WHITE NON-HISPANIC
2021-01-01 Arlington County Police Department ARLINGTON CO TRAFFIC VIOLATION BLACK OR AFRICAN AMERICAN NON-HISPANIC

(only 1st 6 columns shown above)

See the OpenPoliceData wiki for further documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openpolicedata-0.2.tar.gz (24.8 kB view hashes)

Uploaded Source

Built Distribution

openpolicedata-0.2-py3-none-any.whl (24.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page