Data used for safety analyses have unique characteristics that are not found in other disciplines. In this research, we examine three characteristics that can negatively influence the outcome of these safety analyses: 1) crash data with many zero observations; 2) rare occurrence of crash events (not necessarily related to many zero observations); and, 3) big datasets. These unique characteristics can lead to biased results if inappropriate analysis tools are used. The objectives of this study are to simplify the analysis of highway safety data and develop guidelines and analysis tools for handling these unique characteristics. The research provides guidelines on when to aggregate data over time and space to reduce the number zero observations; use heuristics for selecting statistical models; propose a bias adjustment method for improving the estimation of risk factors; develop a decision-adjusted modeling framework in predicting risk; and, show how cluster analyses can be used to extract relevant information from big data. The guidelines and tools were developed using simulation and observed datasets. Examples are provided to illustrate the guidelines and tools.
- Provide guidance about when the data should be aggregated as a function of the coefficient of variation of the variables in the dataset.
- Provide guidance about when the negative binomial distribution should be used over the Poisson-lognormal distribution or model.
- Provide guidance about when the negative binomial distribution should be used over the negative binomial-Lindley distribution or model.
- Provide a bias-correction procedure for datasets with a small number of crashes or are imbalanced (small number of crashes in one category of a covariate).
- Provide guidance about when a cluster analysis can be applied to create new predictors to potentially produce insight or reduce data dimension.
EWD & T2 Products
Course Modules (website link): Course modules for both undergraduate and graduate courses at Texas A&M and Virginia Tech will be made available by Dr. Dominique Lord via this link soon. These modules will be incorporated into the following courses: CVEN 626 – Highway Safetyy (TAMU – Fall 2019), STAT 4504 – Applied Multivariate Analysis (VT – Fall 2019), STAT 5504G/STAT 5594 – Statistical Epidemiology and Observation Study (VT – Spring 2020).
Course Slides (pptx): Material developed from Safe-D Project 01-001 which was included in course module development for CVEN 626 – Highway Safety at TAMU.
Webinars (links coming soon): Two UTC presentations/webinars are currently in preparation based on the results of this research. One will focus on the work conducted at VTTI and the other will describe the characteristics of the heuristics method. These presentations are anticipated to be performed in the fall of 2019.
Student Impact Statement (pdf): Two students were funded under this project (Mohammadali (Ali) Shirazi from TAMU and Huiying (Maggie) Mao from VT, both Ph.D. students). This file contains a statement of the impact this project made on these students’ education and workforce development.
Project Brief (pdf): This document provides a brief description of the project and summary results.
Shirazi, M., Lord, D., & Geedipally, S.R., (2020) A Simulation Analysis to Study the Temporal and Spatial Aggregations of Safety Datasets with Excess Zero Observations. Paper to be presented at the 99th Annual Meeting of the Transportation Research Board, Washington, D.C. (Accepted)
Shirazi, M., and D. Lord (2019) Characteristics Based Heuristics to Select a Logical Distribution between the Poisson-Gamma and the Poisson-Lognormal for Crash Data Modelling. Transportmetrica A: Transport Science, Vol 15, Issue 2, pp. 1791-1803. (Published)
Mao, H., I X. Deng, and F. Guo (2019) Modeling Crash Risk. International Conference on Frontiers of Data Science, Hangzhou, China, May 27, 2019 (Accepted)
Mao, H., Deng, X., Lord, D., Flintsch, G., & Guo, F. (2019). Adjusting finite sample bias in traffic safety modeling. Accident Analysis & Prevention, 131, 112-121. doi: https://doi.org/10.1016/j.aap.2019.05.026 (Published)
Shirazi, M., & Lord, D. (2017, October) An Approach Towards Automation of Model Selection. Poster presented at the 2017 INFORMS Annual Meeting, Oct. 22–25, 2017, Houston, TX. [Additional information about the paper can be found here: https://ceprofs.civil.tamu.edu/dlord/#Publication] (Published)
Shirazi, M., and D. Lord (2018). Characteristics Based Heuristics to Select a Logical Distribution between the Poisson Gamma and the Poisson Lognormal for Crash Data Modeling. Paper presented at the 97th Annual Meeting of the Transportation Research Board, 2018. (Published)
Shirazi, M., S.S. Dhavala, D. Lord, and S.R. Geedipally (2017) A Methodology to Design Heuristics for Model Selection Based on Characteristics of Data: Application to Investigate when the Negative Binomial Lindley (NB-L) Is Preferred over the Negative Binomial (NB). Accident Analysis & Prevention, Vol. 107, 2017, pp. 186-194. http://dx.doi.org/10.1016/j.aap.2017.07.002 (Published)
The final datasets for this project are located in the Safe-D Collection on the VTTI Dataverse; DOI: 10.15787/VTT1/QQEZOP.
Research Investigators (PI*)
Start Date: 2017-05-01
End Date: 2018-12-31
Grant Number: 69A3551747115
Total Funding: $293,890
Source Organization: Safe-D National UTC
Project Number: 01-001
Safe-D Theme Areas
Safe-D Application Areas
Planning for Safety
Operations and Design
UTC Project Information Form
Office of the Assistant Secretary for Research and Technology
University Transportation Centers Program
Department of Transportation
Washington, DC 20590 United States
Texas A&M University
Texas A&M Transportation Institute
College Station, Texas 77843-3135
Virginia Polytechnic Institute and State University
Virginia Tech Transportation Institute
3500 Transportation Research Plaza
Blacksburg, Virginia 24061
San Diego State University
5500 Campanile Dr
San Diego, CA 92182