Time Series Forecasting Using Unemployment Data

Introduction: Due to the COVID-19 pandemic, there has been an uptick in unemployment that is as unforeseeable as the pandemic itself. The uncertainty about the job market brought on by the pandemic has led us to be a little curious about the future levels of unemployment as we enter the workforce. Thus, we wanted to find a time series model that can most accurately forecast future unemployment rates.

Dataset: The dataset used is from the US Bureau of Labor Statistics. The data range is from May 1950- June 2020.

Partitioning: Training – 80% of observations Validation – 20% of observations (Models were based on the training set and the validation set was used as actual values that were compared to forecasted values to find the error of models).

Models: We compared several models listed below to find the best predictor for unemployment.

Naive Models:

Naive Method
Drift Method
Seasonal Naive

Smoothing Models:

Simple Average Method
Moving Average Method
Exponential Smoothing Method
Holt-Winters

ARIMA:

Initial ARIMA
Rough Search ARIMA

Errors: After running all the models, we summarized the errors of each model in the table below in order to compare and determine the model with the lowest error, which would in turn be the best predictor of future unemployment rates.

Model	Me	RMSE	MAE	MPE	MAPE	MASE
Naive	1.70	2.81	2.01	18.03	26.26	2.50
Drift	1.82	2.87	2.04	20.23	26.20	2.54
Simple Moving Average	0.66	2.33	1.87	-0.54	28.71	2.32
Exponential optimal	1.70	2.81	2.01	18.03	26.25	2.50
Alpha=0.1	1.31	2.59	1.90	11.07	25.89	2.36
Alpha =0.3	1.57	2.74	1.96	15.75	25.90	2.44
Alpha =0.5	1.64	2.77	1.99	16.89	26.04	2.47
Alpha =0.7	1.67	2.79	2.00	17.45	26.14	2.49
Alpha =0.9	1.69	2.80	2.01	17.85	26.21	2.50
Holt	6.14	6.73	6.14	103.92	103.92	7.62
ARIMA (2,1,2)	1.81	2.88	2.06	19.91	26.70	2.56

The model with the lowest overall errors was the Simple Moving Average model.

Forecasting: Using the Simple Moving Average model, we forecasted unemployment for July 2020.

Month	Point Forecast	Low 80%	High 80%	Low 95%	High 95%
July	5.77	5.58	7.97	2.43	9.13

Analysis:

Since we started this project without July 2020 data, we forecasted for July. However, now we have the actual unemployment for July 2020 – 10.2%. While our simple moving average model’s point forecast is 5.77%, the model also forecasted a high unemployment rate of 9.13 using a 95% confidence interval. This is only an 8.82% error compared to the actual. Considering the uncertainty brought on by COVID-19, this percentage error shows that the model does an adequate job of predicting unemployment.

Final Thoughts: As we have seen, there are many lessons to learn from COVID-19. Just looking at the basis for time series forecasting, it works using historical data and predicts future events based on just the data. However, the limitation to this that has become obvious as we go through COVID- 19 is that there is not always a history to look back on and the numbers do not have the capability to tell the whole story. Models such as the ones we used only share one point of view that is subject to the interpretation of the data analyst. It is up to the analyst to use knowledge outside the numbers to explain results and tailor to circumstances. For example, in this case it takes knowledge about COVID-19 to make the decision to use the higher end predictions as the forecast since we know the pandemic has led to higher unemployment.

References: If you would like access to the dataset we used or the R code for the models, please use the links below.

https://data.bls.gov/pdq/SurveyOutputServlet

(Github)Link dataset:

https://github.com/unravelthedata/Time_Series_Using_Unemployment_Data/blob/master/unemploymentblog.csv

(Github)Link to R Code: https://github.com/unravelthedata/Time_Series_Using_Unemployment_Data/blob/master/Unemployment_Project_R_Code

Share this: