Skip to main content

Python package for downloading and formatting the UK's Road Safety Data.

Project description

🚸 py-stats19

Authors:

Xiaowei Gao [📩 Email: ucesxwg@ucl.ac.uk] (SpacetimeLab, UCL, UK)

Jinshuai Ma [📩 Email: j.ma23@lse.ac.uk] (LSE Data Science Institute, UK)

Supervisors:

Dr. James Haworth, Associate Professor in Spatio-temporal Analytics, SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, UCL

Prof. Tao Cheng, Professor in GeoInformatics, SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, UCL

🚸 py-stats19 is a Python package developed to support digital twin applications for spatio-temporal urban crash analysis. Inspired by the R stats19 package package, this package provides a Python version tool to download and format the Road Safety Data from the official Road Safety Database published by the Department for Transport, UK, since 1979. Additionally, py-stats19 enhances the data by incorporating extra temporal information and geometric details.

The whole data set contains three tables: casualty, collision, and vehicle. The data set is updated annually and contains detailed information about road traffic accidents in Great Britain.

🧰 The current py-stats19 package is under development and testing stages. It is available as a beta version for early access.

Installation

Install using pip:

$ pip install pystats19

Alternatively, download and install the latest release from Github, e.g. pystats19-0.1.0-py3-none-any.whl.

$ pip install pystats19-0.1.0-py3-none-any.whl

list_files()

list_files() can list all available stats19 dataset files, which can be simply filtered by passing year and table arguments.

Here, you could specify the table name as casualty, collision, or vehicle. Those files could be merged by the accident_index key.

from pystats19.read import list_files

# List all files contain year 2021 data
list_files(year=2021) 
# ['dft-road-casualty-statistics-casualty-1979-latest-published-year.csv',
#  'dft-road-casualty-statistics-casualty-2021.csv',
#  'dft-road-casualty-statistics-casualty-last-5-years.csv',
#  'dft-road-casualty-statistics-collision-1979-latest-published-year.csv',
#  'dft-road-casualty-statistics-collision-2021.csv',
#  'dft-road-casualty-statistics-collision-last-5-years.csv',
#  'dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv',
#  'dft-road-casualty-statistics-vehicle-2021.csv',
#  'dft-road-casualty-statistics-vehicle-e-scooter-2020-Latest-Published-Year.csv',
#  'dft-road-casualty-statistics-vehicle-last-5-years.csv']

# List all files contain year 2021 and table vehicle data
list_files(year=2021, table="vehicle")
# ['dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv',
#  'dft-road-casualty-statistics-vehicle-2021.csv',
#  'dft-road-casualty-statistics-vehicle-last-5-years.csv']

pull_file()

pull() requires filename parameter, downloading the data file. filename should be obtained using list_files().

Optionally, data_dir can specify the location where the file will be stored.

from pystats19.source import pull_file

pull_file('dft-road-casualty-statistics-vehicle-2019.csv', data_dir="./data")

Data directory

Data directory can also be configured globally by setting an environment variable PYSTATS19_DOWNLOAD_DIRECTORY

$ export PYSTATS19_DOWNLOAD_DIRECTORY=~/my_pystats19_data

load()

load() loads the data file as a pandas.DataFrame or geopandas.GeoDataFrame. Set auto_download=True to automatically download the file if not exists.

Optionally,

set convert_code_to_label=True to convert categorical data codes to text labels.

set add_temporal_info=True to format datetime and time and add additional time information.

set add_geo_info=True to add geo information. This will return a geopandas.GeoDataFrame.

from pystats19.read import load

load(
    'dft-road-casualty-statistics-collision-2021.csv',
    auto_download=True,
    convert_code_to_label=True,
    add_temporal_info=True,
    add_geo_info=True
)

# Removed 17 records due to missing Latitude or Longitude.
# 
#        accident_index  ...                       geometry
# 0       2021010287148  ...   POINT (521509.659 193079.41)
# 1       2021010287149  ...  POINT (535380.824 180783.228)
# 2       2021010287151  ...  POINT (529702.828 170398.085)
# 3       2021010287155  ...  POINT (525313.658 178385.183)
# 4       2021010287157  ...  POINT (512145.497 171526.072)
# ...               ...  ...                            ...
# 101082  2021991196247  ...  POINT (325545.894 674547.399)
# 101083  2021991196607  ...  POINT (271195.339 558271.954)
# 101084  2021991197944  ...   POINT (357296.909 860766.24)
# 101085  2021991200639  ...  POINT (326935.908 675924.391)
# 101086  2021991201030  ...  POINT (270574.351 556367.939)
# [101070 rows x 39 columns]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystats19-0.1.2.tar.gz (34.7 kB view details)

Uploaded Source

Built Distribution

pystats19-0.1.2-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file pystats19-0.1.2.tar.gz.

File metadata

  • Download URL: pystats19-0.1.2.tar.gz
  • Upload date:
  • Size: 34.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for pystats19-0.1.2.tar.gz
Algorithm Hash digest
SHA256 75a5d290de3a44f575e6881e59b0da77e1b8ef066a74987cc751c448c86a5c2f
MD5 c421ae7b0f73879e688876dec214beaa
BLAKE2b-256 b2bf1baf1806195911d47e3a42bcbd071abf82a8d805d79a2dbaa0809710067e

See more details on using hashes here.

File details

Details for the file pystats19-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pystats19-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for pystats19-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29a9ce3300a63c723bf864962b75a3ca73fa7c7de273e0cf688630aad261c176
MD5 3db4b0d370826b46d8d7e1b732868531
BLAKE2b-256 6a7202b7f456b9dca554b51d551df9dfad9456be55456d513946ded94e9807ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page