Python package for working with open road traffic casualty data from Great Britain
Project description
🚸 py-stats19
Authors:
Xiaowei Gao [📩 Email: ucesxwg@ucl.ac.uk] (SpacetimeLab, UCL, UK)
Jinshuai Ma [📩 Email: j.ma23@lse.ac.uk] (LSE Data Science Institute, UK)
Supervisors:
Dr. James Haworth, Associate Professor in Spatio-temporal Analytics, SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, UCL
Prof. Tao Cheng, Professor in GeoInformatics, SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, UCL
🚸 py-stats19 is a Python package developed to support digital twin applications for spatio-temporal urban crash analysis. Inspired by the R stats19 package package, this package provides a more efficient tool to download and format the Road Safety Data from the official Road Safety Database published by the Department for Transport, UK, since 1979. Additionally, py-stats19 enhances the data by incorporating extra temporal information
and geometric details
.
The whole data set contains three tables: casualty
, collision
, and vehicle
. The data set is updated annually and contains detailed information about road traffic accidents in Great Britain.
🧰 The current py-stats19 package is under development and testing stages. It is available as a beta version for early access.
Installation
Download the latest release, e.g. pystats19-0.0.1-py3-none-any.whl.
$ pip install pystats19-0.0.1-py3-none-any.whl
list_files()
list_files()
can list all available stats19 dataset files, which can be simply filtered by passing year
and table
arguments.
Here, you could specify the table name as casualty
, collision
, or vehicle
. Those files could be merged by the accident_index
key.
from pystats19.read import list_files
# List all files contain year 2021 data
list_files(year=2021)
# ['dft-road-casualty-statistics-casualty-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-casualty-2021.csv',
# 'dft-road-casualty-statistics-casualty-last-5-years.csv',
# 'dft-road-casualty-statistics-collision-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-collision-2021.csv',
# 'dft-road-casualty-statistics-collision-last-5-years.csv',
# 'dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-vehicle-2021.csv',
# 'dft-road-casualty-statistics-vehicle-e-scooter-2020-Latest-Published-Year.csv',
# 'dft-road-casualty-statistics-vehicle-last-5-years.csv']
# List all files contain year 2021 and table vehicle data
list_files(year=2021, table="vehicle")
# ['dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-vehicle-2021.csv',
# 'dft-road-casualty-statistics-vehicle-last-5-years.csv']
pull_file()
pull()
requires filename
parameter, downloading the data file. filename
should be obtained using list_files()
.
Optionally, data_dir
can specify the location where the file will be stored.
from pystats19.source import pull_file
pull_file('dft-road-casualty-statistics-vehicle-2019.csv', data_dir="./data")
Data directory
Data directory can also be configured globally by setting an environment variable PYSTATS19_DOWNLOAD_DIRECTORY
$ export PYSTATS19_DOWNLOAD_DIRECTORY=~/my_pystats19_data
load()
load()
loads the data file as a pandas.DataFrame
or geopandas.GeoDataFrame
. Set auto_download=True
to automatically download the file if not exists.
Optionally,
set convert_code_to_label=True
to convert categorical data codes to text labels.
set add_temporal_info=True
to format datetime
and time
and add additional time information.
set add_geo_info=True
to add geo information. This will return a geopandas.GeoDataFrame
.
from pystats19.read import load
load(
'dft-road-casualty-statistics-collision-2021.csv',
auto_download=True,
convert_code_to_label=True,
add_temporal_info=True,
add_geo_info=True
)
# Removed 17 records due to missing Latitude or Longitude.
#
# accident_index ... geometry
# 0 2021010287148 ... POINT (521509.659 193079.41)
# 1 2021010287149 ... POINT (535380.824 180783.228)
# 2 2021010287151 ... POINT (529702.828 170398.085)
# 3 2021010287155 ... POINT (525313.658 178385.183)
# 4 2021010287157 ... POINT (512145.497 171526.072)
# ... ... ... ...
# 101082 2021991196247 ... POINT (325545.894 674547.399)
# 101083 2021991196607 ... POINT (271195.339 558271.954)
# 101084 2021991197944 ... POINT (357296.909 860766.24)
# 101085 2021991200639 ... POINT (326935.908 675924.391)
# 101086 2021991201030 ... POINT (270574.351 556367.939)
# [101070 rows x 39 columns]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pystats19-0.1.0.tar.gz
.
File metadata
- Download URL: pystats19-0.1.0.tar.gz
- Upload date:
- Size: 34.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 789e06d768b0bed6e0bf3fbc26d4bee19f9396d8a2812e6f5708e8ce5a962211 |
|
MD5 | e24fde120ad53757d03d0251907165e9 |
|
BLAKE2b-256 | a463da22a9de6ecebeb1792ff49253c5e336945ad8c1ebd8e3dcc4e0ad84b614 |
File details
Details for the file pystats19-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pystats19-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd9b917d9a5bc53bece5fc213df43cfa838196f4b0e178598c2e7866aa8aa5be |
|
MD5 | f1e21f2a157a05a37a6e9a7f9e242581 |
|
BLAKE2b-256 | c40f64dfabda1fc6ec8e514203a64af2fb4ccf900ea3c8baedbf0f9ae18042ea |