A package for loading and preprocessing the NHTSA FARS crash database
Project description
FARS Cleaner fars_cleaner
fars-cleaner
is a Python library for downloading and pre-processing data
from the Fatality Analysis Reporting System, collected annually by NHTSA since
1975.
Installation
The preferred installation method is through conda
.
conda install -c conda-forge fars_cleaner
You can also install with pip.
pip install fars-cleaner
Usage
Downloading FARS data
The FARSFetcher
class provides an interface to download and unzip selected years from the NHTSA FARS FTP server.
The class uses pooch
to download and unzip the selected files. By default, files are unzipped to your OS's cache directory.
from fars_cleaner import FARSFetcher
# Prepare for FARS file download, using the OS cache directory.
fetcher = FARSFetcher()
Suggested usage is to download files to a data directory in your current project directory.
Passing project_dir
will download files to project_dir/data/fars
by default. This behavior can be
overridden by setting cache_path
as well. Setting cache_path
alone provides a direct path to the directory
you want to download files into.
from pathlib import Path
from fars_cleaner import FARSFetcher
SOME_PATH = Path("/YOUR/PROJECT/PATH")
# Prepare to download to /YOUR/PROJECT/PATH/data/fars
# This is the recommended usage.
fetcher = FARSFetcher(project_dir=SOME_PATH)
# Prepare to download to /YOUR/PROJECT/PATH/fars
cache_path = "fars"
fetcher = FARSFetcher(project_dir=SOME_PATH, cache_path=cache_path)
cache_path = Path("/SOME/TARGET/DIRECTORY")
# Prepare to download directly to a specific directory.
fetcher = FARSFetcher(cache_path=cache_path)
Files can be downloaded in their entirety (data from 1975-2018), as a single year, or across a specified year range.
Downloading all of the data can be quite time consuming. The download will simultaneously unzip the folders, and delete
the zip files. Each zipped file will be unzipped and saved in a folder {YEAR}.unzip
# Fetch all data
fetcher.fetch_all()
# Fetch a single year
fetcher.fetch_single(1984)
# Fetch data in a year range (inclusive).
fetcher.fetch_subset(1999, 2007)
Processing FARS data
Calling load_pipeline
will allow for full loading and pre-processing of the FARS data requested by the user.
from fars_cleaner import FARSFetcher, load_pipeline
fetcher = FARSFetcher(project_dir=SOME_PATH)
vehicles, accidents, people = load_pipeline(fetcher=fetcher,
first_run=True,
target_folder=SOME_PATH)
Calling load_basic
allows for simple loading of the FARS data for a single year, with no preprocessing. Files must
be prefetched using a FARSFetcher
or similar method. A mapper
dictionary must be provided to identify what, if
any, columns require renaming.
from fars_cleaner.data_loader import load_basic
vehicles, accidents, people = load_basic(year=1975, data_dir=SOME_PATH, mapping=mappings)
Requirements
Downloading and processing the full FARS dataset currently runs out of memory on Windows machines with only 16GB RAM. It is recommended to have at least 32GB RAM on Windows systems. macOS and Linux run with no issues on 16GB systems.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. See CONTRIBUTING.md for more details.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fars_cleaner-1.3.5.tar.gz
.
File metadata
- Download URL: fars_cleaner-1.3.5.tar.gz
- Upload date:
- Size: 3.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 681057391c9f09a1b765cf8f39957fdf38acaed335db004163a67ab23a721609 |
|
MD5 | 31f1ddfc4095401c6d992ac9000d30ee |
|
BLAKE2b-256 | 08f72cfaca70a66e1eb206be8e90eaab58fe447c93028a170e2a68729a8901d5 |
File details
Details for the file fars_cleaner-1.3.5-py3-none-any.whl
.
File metadata
- Download URL: fars_cleaner-1.3.5-py3-none-any.whl
- Upload date:
- Size: 3.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae343ccbf7431ac97003206f4ffec063c48806ad801e63f118ace29068152de7 |
|
MD5 | 873b9c988635e0f68c6b6e47a466441f |
|
BLAKE2b-256 | a78991abc34178e9b80fe55d728765090646746360f1ba2392bc470ebf4bba0a |