Forecasting Analysis
1. Introduction
In order to determine whether forecasting directives are required and if so, which ones, it is necessary to consider the nature of the historical data. This was investigated in “A forecasting case study – part 1”.
The default method for forecasting is to look for patterns in the historical data. If the forecaster identifies one or more patterns it will utilize this to calculate the forecast for the future dates. If no pattern is found then it will use a weighted average of recent historical data. This method is often suitable for short term forecasts (say 1 – 3 weeks into the future). However, there are other methods available which may be more appropriate. This needs to be determined. Once more the key to this will lie in the historical data.
2. Methodology
For this exercise we wished to produce forecasts for the first three weeks of March 2012 based on the historical data up to the 29th February using different forecasting methods. In order to determine the accuracy of the forecasts, these were then compared to the actual data for the same period.
Initial comparisons were also made to compare the actual calls received in March 2012 with previous call actuals. The purpose of this was to give some indication about the nature of the call volumes month on month and year on year.
In this case, two comparisons were made – an annual comparison and a monthly comparison. Using actual data from March 2012 (up to the 20th) comparisons were made between Feb 2012 and Mar 2012, and also between Mar 2012 and Mar 2011 for a selection of queues which are shown below:
Firstly comparing the last two months for the main queue group (minus the recent additions identified in part one*).
* See the section on recently added queues below
Secondly the same comparison for the largest queue in the group:
Conclusion: on most days March was less busy than February – on some days this difference was significant (>20%).
Next, comparing March 2012 with the previous year, March 2011:
(Note: this group excludes queues with no data in March 2011)
Conclusion: this again shows a significant variance (> 20%) on some days but not others.
Looking at individual queues shows a larger year on year variation in volume:
Conclusion: in most cases March 2011 has higher volumes of calls received than March 2012
Alternative options: using forecasting directives
Using a database with data up to 29th February, forecasts were created for the first three weeks of March 2012 using different directives. It was then possible to compare the forecasts with the actual data to determine which were more accurate. The results for the main queue group are summarised below:
(Note: the highlighted cells show where the variance from the actuals was less than 10%)
Similar results were obtained for individual queues.
These results indicate that the seasonal trends identified in part 1 of this case study were being repeated in March 2012 but that some account should be taken of year on year changes in volumes. Therefore, when generating forecasts, the best results would be obtained by using directives designed to recognise this trend
3. Understanding Forecasting Directives
Form forecasting data set by day of week in corresponding months of the year;
[Option 3] This tells the forecaster to look back at previous years’ data and base the forecast around those patterns.
The next three directives below are used to adjust the forecast based on recent history – to take account of changes in call volumes compared to the previous year:
Renormalize forecasted data using centered current data points relative to the previous years;
[Option 4] This uses a variable amount of data for the comparison, dependent on how far ahead the forecast.
Renormalize forecasted data using the actual data in the current year relative to the previous years;
[Option 5] This looks at current year’s data and compares to the same period last year.
Use the x most recent actuals data elements to renormalize the forecast;
[Option 6] This uses the last x weeks of data to carry out the comparison.
Options 4, 5 and 6 produce similar results and the most appropriate should be chosen
depending mainly on the distance into the future the forecast was for.
4. Recently added queues
There are two queues for which there is only actual data from November 2011 which were identified in part 1 of this document. The above forecasting method is not suitable for these queues so they should be excluded and forecast separately.
The data for these queues looks like this, firstly totaled for the week:
and secondly for a selected day (Friday):
Queue 1 has quite steady volumes which are also fairly low. Queue 2 has, on the face of it, an unpredictable call volume although there are signs of more settled behaviour since the end of January. The default forecasting method is recommended, with the ‘auto detect growth trend’ disabled, until more actual data is available. This will produce a weighted average of recent data.
5. Low Volume Queues
The actual data shows that, for this customer, all queues have data for each time step recorded, even if this is a zero. This makes things more straightforward for low volume queues (LVQs) as the forecaster will not attempt to reconstruct ‘missing’ data. The normal recommendation for this type of queue is to aggregate them with one or more larger volume queues – they will then effectively take on the properties of the larger queue(s) for forecasting purposes (see “A guide to LVQ forecasting” for details on this issue).
However, further analysis of these queues should be undertaken initially to see if there is any seasonality or other behaviour which might require them to be considered separately. As we saw in part 1, a couple of the LVQs did show a short term rise in call volumes which would be lost if these were aggregated. It is up to the user to determine whether this pattern is likely to be repeated. If so then those queues should be forecasted separately – at least for the peak period.
6. Special Events
From the analysis of historical data done previously it was suggested that there may be identifiable special events which were affecting the call volumes at certain times of the year.
Some of these are easily identifiable (e.g. public holidays and Christmas/New Year) but others, if they exist, would require further analysis of the historical data and some local knowledge of predictable events which may be affecting volumes.
However, was also noted that there is a pretty strong year on year correlation of call volumes which will be considered using the aforementioned directives. It is possible that this yearly correlation may be sufficient to predict the volume changes without the need to identify each as a special event. The user may only need to identify those events which are not recurring annually.
7. Summary of Recommendations
- Consider aggregating low volume queues with higher volume queues. Whilst not strictly necessary it may increase the accuracy of the forecast for these low volume queues.
- Remove recent queue additions from the current queue groups and forecast separately as described above.
- For all other queues use the directives outlined above. The directive ‘form forecasting data set by day of week in corresponding months of the year’ should be used in conjunction with one of the normalizing directives as appropriate.