Explore, load, and get documentation for Colorado crime data.
Project description
crime
View Updated Documentation
Source code is here
Easily load online crime datasts. Explore available datasets from inside a python notebook, with descriptive cell outputs showing general info and descriptions of each dataset and documentation of each column.
Install & Use
pip install crime
import crime as cr
Later, run pip install -U crime
every few days to make sure you've got the latest version.
Note: this library should work with any recent Python version, but it has only been tested with 3.9.
How does it work?
Crime pre-defines nicknames and ids for a collection of Socrata datasets like this one for you to pick from. This info isn't stored in the package itself, but rather in this json file on Github, which can be updated anytime without changing the code. Every time you import crime
, a Github API request is made to retrieve this configuration, so you'll need internet. Calling cr.sources()
without parameters will just return this info, without making any additional requests.
In addition to letting you load/preview any of these datasets, crime
's most important feature is its ability to show a detailed description on each dataset, with full documentation on every column. When you run cr.sources('dataset_name')
, an api request is made to Socrata to get the metadata on a particular dataset. The most useful information gets formatted & printed to your screen. Here is what that output would look like if you looped through each dataset name and printed its description.
Caching: Any dataset you load fully will get stored in memory. So next time you request it within the same Jupyter notebook session, it will be available immediately.
Getting Started
Use
cr.help()
for a quick intro.
Let's look at the crime data available
cr.sources() # returns a DataFrame
You'll get a DataFrame with basic info on all the sources. The index,
Name
is the nickname with which you'll refer to the dataset moving forward.
To examine a source, pass the name of the dataset to sources()
. This will make an api request to get all of its metadata.
Let's see the details on crime_vs_incarceration
rate. All the info below is coming from Socrata's api.
cr.sources('crime_vs_incarceration')
Total Crime Rate vs Incarceration Rate Chart
https://dev.socrata.com/foundry/data.colorado.gov/ae3x-wvn9
Total Crime includes: Violent crimes- Murder and non-negligent manslaughter,
forcible rape, robbery, and aggravated assault. Property crimes - Burglary,
larceny/theft, and motor vehicle theft. National or state offense totals are
based on data from all reporting agencies and estimates for unreported areas.
Rates are the number of reported offenses per 100,000 population. These
figures are based on end of calendar year populations.
COLUMNS:
-------
Year
Field: year
Type: text
Null: 0
Count: 31
Population
Field: population
Type: number
Null: 0
Count: 31
Avg: 4019137.064516129
Max: 5187582
Min: 3045000
Sum: 124593249
Violent Crime Total
Field: violent_crime_total
Type: number
Null: 0
Count: 31
Avg: 16445.54838709677
Max: 20229
Min: 13811
Sum: 509812
(output is truncated to save space)
Here's what you'll see for text/categorical columns...
Race
Field: race
Type: text
Null: 30
Count: 209078
ITEMS:
White (164276)
Black (39469)
Asian/Pacific Islander (2216)
Unknown (1901)
American Indian/Alaskan Native (1216)
Now we'll load some data
cr.load('arrest_demographics')
Returns 5-row preview by default, because some datasets have several million rows. To get the full dataset:
cr.load('arrest_demographics', full=True)
Get more info on a source
Return dictionary with full metadata
cr.metadata('dataset_name')
Return dataframe with metrics on each column
cr.columns('dataset_name')
Caching
Any dataset you load fully (by passing
full=True
) will only have to be downloaded from the internet once during your notebook session, regardless of whether you've assigned it to a variable.After you fully load a dataset, you can leave out the
full=True
next time you want to access it, and the full dataframe will be returned instantly. Or, you can usecr.df('name')
to fetch straight from the cache.
For example, if you run this at the top of your notebook...
cr.load('arrest_demographics', full=True)
Now, elsewhere in your notebook...
EITHER of these 3 lines will return the same thing: the full dataset
cr.load('arrest_demographics', full=True)
cr.load('arrest_demographics')
# Shorthand to fetch straight from the cache. Returns empty df if none are found in cache
cr.df('arrest_demographics')
How to define your own set of data sources.
First, select a dataset on OpenDataNetwork and hit "View API". If you're brought to an API page like this one, (not all datasets have one), locate the "Dataset Identifier" on top-right side of page. Use that as
id
. Forbase_url
, use the section of the url that comes after/foundry/
.
# Pass a dictionary
cr.set_sources(
{
'district_arrests': { # this is the nickname you'll refer to
"id": "2e5i-5hfy",
"base_url": "data.colorado.gov"
},
'district_crime': {
"id": "ya69-n6ta",
"base_url": "data.colorado.gov"
},
# etc...
}
)
To restore the original list of sources, use:
cr.reset_sources()
No proper documentation yet. View the source code if needed.
If there's a dataset not yet listed in our pre-defined sources, you can use the sodapy
API wrapper to retrieve it manually.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.