Exponential Smoothing

Exponential smoothing is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.

Exponential smoothing is one of many window functions commonly applied to smooth data in signal processing, acting as low-pass filters to remove high-frequency noise.

The raw data sequence is often represented by [math]\{x_t\}[/math] beginning at time [math]t = 0[/math], and the output of the exponential smoothing algorithm is commonly written as [math]\{s_t\}[/math], which may be regarded as a best estimate of what the next value of [math]x[/math] will be. When the sequence of observations begins at time [math]t = 0[/math], the simplest form of exponential smoothing is given by the formulas:^[1]

[[math]] \begin{align*} s_0& = x_0\\ s_t & = \alpha x_{t} + (1-\alpha)s_{t-1},\quad t\gt0 \end{align*} [[/math]]

where [math]\alpha[/math] is the smoothing factor, and [math]0 \lt \alpha \lt 1[/math].

Basic (simple) exponential smoothing (Holt linear)

The use of the exponential window function is first attributed to Poisson^[2] as an extension of a numerical analysis technique from the 17th century, and later adopted by the signal processing community in the 1940s. Here, exponential smoothing is the application of the exponential, or Poisson, window function. Exponential smoothing was first suggested in the statistical literature without citation to previous work by Robert Goodell Brown in 1956,^[3] and then expanded by Charles C. Holt in 1957.^[4] The formulation below, which is the one commonly used, is attributed to Brown and is known as "Brown’s simple exponential smoothing".^[5]

The simplest form of exponential smoothing is given by the formula:

[[math]]s_t = \alpha x_t + (1-\alpha) s_{t-1} = s_{t-1} + \alpha (x_t - s_{t-1}).[[/math]]

where [math]\alpha[/math] is the smoothing factor, and [math]0 \le \alpha \le 1[/math]. In other words, the smoothed statistic [math]s_t[/math] is a simple weighted average of the current observation [math]x_t[/math] and the previous smoothed statistic [math]s_{t-1}[/math]. Simple exponential smoothing is easily applied, and it produces a smoothed statistic as soon as two observations are available. The term smoothing factor applied to [math]\alpha[/math] here is something of a misnomer, as larger values of [math]\alpha[/math] actually reduce the level of smoothing, and in the limiting case with [math]\alpha[/math] = 1 the output series is just the current observation. Values of [math]\alpha[/math] close to one have less of a smoothing effect and give greater weight to recent changes in the data, while values of [math]\alpha[/math] closer to zero have a greater smoothing effect and are less responsive to recent changes.

There is no formally correct procedure for choosing [math]\alpha[/math]. Sometimes the statistician's judgment is used to choose an appropriate factor. Alternatively, a statistical technique may be used to optimize the value of [math]\alpha[/math]. For example, the method of least squares might be used to determine the value of [math]\alpha[/math] for which the sum of the quantities [math](s_t - x_{t+1})^2[/math] is minimized.^[6]

Unlike some other smoothing methods, such as the simple moving average, this technique does not require any minimum number of observations to be made before it begins to produce results. In practice, however, a "good average" will not be achieved until several samples have been averaged together; for example, a constant signal will take approximately [math]3 / \alpha[/math] stages to reach 95% of the actual value. To accurately reconstruct the original signal without information loss all stages of the exponential moving average must also be available, because older samples decay in weight exponentially. This is in contrast to a simple moving average, in which some samples can be skipped without as much loss of information due to the constant weighting of samples within the average. If a known number of samples will be missed, one can adjust a weighted average for this as well, by giving equal weight to the new sample and all those to be skipped.

This simple form of exponential smoothing is also known as an exponentially weighted moving average (EWMA). Technically it can also be classified as an autoregressive integrated moving average (ARIMA) (0,1,1) model with no constant term.^[7]

Time constant

The time constant of an exponential moving average is the amount of time for the smoothed response of a unit step function to reach [math]1-1/e \approx 63.2\,\%[/math] of the original signal. The relationship between this time constant, [math] \tau [/math], and the smoothing factor, [math] \alpha [/math], is given by the formula:

[[math]]\alpha = 1 - e^{-\Delta T/\tau}[[/math]]

, thus [math]\tau = - \frac{\Delta T}{\ln(1 - \alpha)}[/math] where [math]\Delta T[/math] is the sampling time interval of the discrete time implementation. If the sampling time is fast compared to the time constant ([math]\Delta T \ll \tau[/math]) then

[[math]]\alpha \approx \frac{\Delta T} \tau [[/math]]

Choosing the initial smoothed value

Note that in the definition above, [math]s_0[/math] is being initialized to [math]x_0[/math]. Because exponential smoothing requires that at each stage we have the previous forecast, it is not obvious how to get the method started. We could assume that the initial forecast is equal to the initial value of demand; however, this approach has a serious drawback. Exponential smoothing puts substantial weight on past observations, so the initial value of demand will have an unreasonably large effect on early forecasts. This problem can be overcome by allowing the process to evolve for a reasonable number of periods (10 or more) and using the average of the demand during those periods as the initial forecast. There are many other ways of setting this initial value, but it is important to note that the smaller the value of [math]\alpha[/math], the more sensitive your forecast will be on the selection of this initial smoother value [math]s_0[/math].^[8]^[9]

Optimization

For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter (α), but for the methods that follow there is usually more than one smoothing parameter.

There are cases where the smoothing parameters may be chosen in a subjective manner – the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.

The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the sum of squared errors (SSE). The errors are specified as [math]e_t=y_t-\hat{y}_{t\mid t-1}[/math] for [math] t=1, \ldots,T[/math] (the one-step-ahead within-sample forecast errors). Hence we find the values of the unknown parameters and the initial values that minimize

[[math]] \text{SSE} = \sum_{t=1}^T (y_t-\hat{y}_{t\mid t-1})^2=\sum_{t=1}^T e_t^2[[/math]]

^[10]

Unlike the regression case (where we have formulae to directly compute the regression coefficients which minimize the SSE) this involves a non-linear minimization problem and we need to use an optimization tool to perform this.

"Exponential" naming

The name 'exponential smoothing' is attributed to the use of the exponential window function during convolution. It is no longer attributed to Holt, Winters & Brown.

By direct substitution of the defining equation for simple exponential smoothing back into itself we find that

[[math]] \begin{align*} s_t& = \alpha x_t + (1-\alpha)s_{t-1}\\[3pt] & = \alpha x_t + \alpha (1-\alpha)x_{t-1} + (1 - \alpha)^2 s_{t-2}\\[3pt] & = \alpha \left[x_t + (1-\alpha)x_{t-1} + (1-\alpha)^2 x_{t-2} + (1-\alpha)^3 x_{t-3} + \cdots + (1-\alpha)^{t-1} x_1 \right] + (1-\alpha)^t x_0. \end{align*} [[/math]]

In other words, as time passes the smoothed statistic [math]s_t[/math] becomes the weighted average of a greater and greater number of the past observations [math]s_{t-1},\ldots, s_{t-}[/math], and the weights assigned to previous observations are proportional to the terms of the geometric progression

[[math]]1, (1-\alpha), (1-\alpha)^2,\ldots, (1-\alpha)^n,\ldots[[/math]]

A geometric progression is the discrete version of an exponential function, so this is where the name for this smoothing method originated according to Statistics lore.

Comparison with moving average

Exponential smoothing and moving average have similar defects of introducing a lag relative to the input data. While this can be corrected by shifting the result by half the window length for a symmetrical kernel, such as a moving average or gaussian, it is unclear how appropriate this would be for exponential smoothing. They also both have roughly the same distribution of forecast error when [math]\alpha = 2/(k+1)[/math]. They differ in that exponential smoothing takes into account all past data, whereas moving average only takes into account [math]k[/math] past data points. Computationally speaking, they also differ in that moving average requires that the past [math]k[/math] data points, or the data point at lag [math]k+1[/math] plus the most recent forecast value, to be kept, whereas exponential smoothing only needs the most recent forecast value to be kept.^[11]

Double exponential smoothing

Simple exponential smoothing does not do well when there is a trend in the data. ^[1] In such situations, several methods were devised under the name "double exponential smoothing" or "second-order exponential smoothing," which is the recursive application of an exponential filter twice, thus being termed "double exponential smoothing". The basic idea behind double exponential smoothing is to introduce a term to take into account the possibility of a series exhibiting some form of trend. This slope component is itself updated via exponential smoothing.

One method, works as follows:^[12]

Again, the raw data sequence of observations is represented by [math]x_t[/math], beginning at time [math]t=0[/math]. We use [math]s_t[/math] to represent the smoothed value for time [math]t[/math], and [math]b_t[/math] is our best estimate of the trend at time [math]t[/math]. The output of the algorithm is now written as [math]F_{t+m}[/math], an estimate of the value of [math]x_{t+m}[/math] at time [math]m \gt 0[/math] based on the raw data up to time [math]t[/math]. Double exponential smoothing is given by the formulas

[[math]] \begin{align*} s_0 & = x_0\\ b_0 & = x_1 - x_0\\ \end{align*} [[/math]]

And for [math]t \gt 0[/math] by

[[math]] \begin{align*} s_t & = \alpha x_t + (1-\alpha)(s_{t-1} + b_{t-1})\\ b_t & = \beta (s_t - s_{t-1}) + (1-\beta)b_{t-1}\\ \end{align*} [[/math]]

where [math]\alpha[/math] ([math]0 \le \alpha \le 1[/math]) is the data smoothing factor, and [math]\beta[/math] ([math]0 \le \beta \le 1[/math]) is the trend smoothing factor.

To forecast beyond [math]x_t[/math] is given by the approximation:

[[math]] F_{t+m} = s_t + m \cdot b_t [[/math]]

Setting the initial value [math]b[/math] is a matter of preference. An option other than the one listed above is [math]\frac{x_n-x_0} n[/math] for some [math]n[/math].

Note that [math]F_0[/math] is undefined (there is no estimation for time 0), and according to the definition [math]F_1= s_0 + b_0[/math], which is well defined, thus further values can be evaluated.

A second method, referred to as either Brown's linear exponential smoothing (LES) or Brown's double exponential smoothing works as follows.^[13]

[[math]] \begin{align*} s'_0 & = x_0\\ s''_0 & = x_0\\ s'_t & = \alpha x_t + (1-\alpha)s'_{t-1}\\ s''_t & = \alpha s'_t + (1-\alpha)s''_{t-1}\\ F_{t+m} & = a_t + mb_t, \end{align*} [[/math]]

where a_t, the estimated level at time t and b_t, the estimated trend at time t are:

[[math]] \begin{align*} a_t & = 2s'_t - s''_t\\[5pt] b_t & = \frac \alpha {1-\alpha} (s'_t - s''_t). \end{align*} [[/math]]

References

^1.0 ^1.1 "NIST/SEMATECH e-Handbook of Statistical Methods". NIST. Retrieved 2010-05-23.
Oppenheim, Alan V.; Schafer, Ronald W. (1975). Digital Signal Processing. Prentice Hall. p. 5. ISBN 0-13-214635-5.
Brown, Robert G. (1956). Exponential Smoothing for Predicting Demand. Cambridge, Massachusetts: Arthur D. Little Inc. p. 15.
Holt, Charles C. (1957). "Forecasting Trends and Seasonal by Exponentially Weighted Averages". Office of Naval Research Memorandum 52. reprinted in Holt, Charles C. (January–March 2004). "Forecasting Trends and Seasonal by Exponentially Weighted Averages". International Journal of Forecasting 20 (1): 5–10. doi:10.1016/j.ijforecast.2003.09.015.
Brown, Robert Goodell (1963). Smoothing Forecasting and Prediction of Discrete Time Series. Englewood Cliffs, NJ: Prentice-Hall.
"NIST/SEMATECH e-Handbook of Statistical Methods, 6.4.3.1. Single Exponential Smoothing". NIST. Retrieved 2017-07-05.
Nau, Robert. "Averaging and Exponential Smoothing Models". Retrieved 26 July 2010.
"Production and Operations Analysis" Nahmias. 2009.
Čisar, P., & Čisar, S. M. (2011). "Optimization methods of EWMA statistics." Acta Polytechnica Hungarica, 8(5), 73–87. Page 78.
7.1 Simple exponential smoothing | Forecasting: Principles and Practice.
Nahmias, Steven (3 March 2008). Production and Operations Analysis (6th ed.). ISBN 0-07-337785-6.^{[page needed]}
"6.4.3.3. Double Exponential Smoothing". itl.nist.gov. Retrieved 25 September 2011.
"Averaging and Exponential Smoothing Models". duke.edu. Retrieved 25 September 2011.

Wikipedia References

Wikipedia contributors. "Exponential smoothing". Wikipedia. Wikipedia. Retrieved 17 August 2022.

[NIST-1] 1.0 ^1.1 "NIST/SEMATECH e-Handbook of Statistical Methods". NIST. Retrieved 2010-05-23.

[Oppenheim,_Alan_V._1975_5-2] Oppenheim, Alan V.; Schafer, Ronald W. (1975). Digital Signal Processing. Prentice Hall. p. 5. ISBN 0-13-214635-5.

[3] Brown, Robert G. (1956). Exponential Smoothing for Predicting Demand. Cambridge, Massachusetts: Arthur D. Little Inc. p. 15.

[4] Holt, Charles C. (1957). "Forecasting Trends and Seasonal by Exponentially Weighted Averages". Office of Naval Research Memorandum 52. reprinted in Holt, Charles C. (January–March 2004). "Forecasting Trends and Seasonal by Exponentially Weighted Averages". International Journal of Forecasting 20 (1): 5–10. doi:10.1016/j.ijforecast.2003.09.015.

[5] Brown, Robert Goodell (1963). Smoothing Forecasting and Prediction of Discrete Time Series. Englewood Cliffs, NJ: Prentice-Hall.

[NIST6431-6] "NIST/SEMATECH e-Handbook of Statistical Methods, 6.4.3.1. Single Exponential Smoothing". NIST. Retrieved 2017-07-05.

[7] Nau, Robert. "Averaging and Exponential Smoothing Models". Retrieved 26 July 2010.

[8] "Production and Operations Analysis" Nahmias. 2009.

[9] Čisar, P., & Čisar, S. M. (2011). "Optimization methods of EWMA statistics." Acta Polytechnica Hungarica, 8(5), 73–87. Page 78.

[otexts.org-10] 7.1 Simple exponential smoothing | Forecasting: Principles and Practice.

[11] Nahmias, Steven (3 March 2008). Production and Operations Analysis (6th ed.). ISBN 0-07-337785-6.^{[page needed]}

[12] "6.4.3.3. Double Exponential Smoothing". itl.nist.gov. Retrieved 25 September 2011.

[13] "Averaging and Exponential Smoothing Models". duke.edu. Retrieved 25 September 2011.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]