Fundamentals of Forecasting

Business  Forecasting  Methods  and  Techniques

Descriptive Statistics Analysis
Single & Multiple Moving Averages
Single & Multiple-Parameters Exponential Smoothing
Exponential Smoothing with Seasonality Methods
Decomposition Methods
Simple and Multiple Regression Analysis
Box-Jenkins (ARIMA)  Models
Multivariate Data Analysis
Home | About me

Descriptive Statistics Analysis

 

  Univariate data
Bivariate data
Multivariate data
Mean
Median
Range
Deviation from the Mean
Mean Absolute Deviation
Sum of Squared Deviations
Mean Squared Deviation
Mean Error
Mean Absolute Error
Sum of Squared Errors
Mean Squared Error
Standard Deviation Errors
Percentage Error
Mean Percentage Error
Mean Absolute Percentage Error
Variance
Covariance
Degree of freedom
Correlation Coefficient
Root Mean Square
Standard Deviation
Standard Error of the Mean
Coefficient of Variation
Skewness
Kurtosis or Kurt(X)
Least-Squares Fitting
Autocovariance
Autocorrelation
U-Statistic

Demand Forecast Accuracy
 

 

Quantitative forecasts are based on data, or observations. A single observation or actual value  can be represented by a variable, Xi ,for example, actual sales $ booked. The objective of forecasting then is to predict the future value of X. Individual forecasts can be denoted by variable Fi ,and the error denoted by ei which is the difference between actual and forecast value for observation i, for example: ei = Xi - Fi . In both time-series and casual forecasting, the variable time interval t denotes the present time period, t - 1 last period, t - 2 two periods ago, and so on. Forecasts are for time periods t + 1, t + 2 ... t + m. For the rest of these sections, I will use the following notations.

 

 

Observed

Historical

Present

Forecasted

Values X1, X2 ... Xn Xt - n ... Xt - 2, Xt - 1 Xt Ft + 1 Ft + 2 ... Ft + m
Period i 1, 2 ... 4 t - m ... t - 2, t - 1 t t + 1, t + 2 ... t + m
Estimated values F1, F2 ... Fn F - n ... F - 2, F - 1 Ft  
Error e1, e2 ... en et - n ...et - 2, et - 1 et  

Univariate
data analysis explores each variable in a data set, separately. It looks at the range of values, as well as the central tendency of the values. It describes the pattern of response to the variable. It describes each variable on its own. Descriptive statistics describe and summarize data. Both Univariate and Bivariate descriptive statistics describe individual variables for the central tendency, variability or dispersion, and shape of the overall distribution.

Bivariate descriptive statistics (e.g., ANOVA, t-tests) often involve more than one quantitative variable being collected and examined. For example, data collected from customers contains kg, feet, locations, dollars spend/week, Return Memo/week. summarizing such data in a way that is analogous to summarizing univariate (single variable) data.

Multivariate descriptive statistics refers to any statistical technique used to analyze data that arises from more than one variable at a time.  Multivariate data analysis is about separating the signal from the noise in data with many variables and presenting the results as easily interpretable plots summarizing the essential information. This essentially models reality where each situation, product, or decision involves more than a single variable. A wide range of methods is used for the analysis of multivariate data such as Regression Analysis, Correspondence Analysis, Factor Analysis, Multidimensional Scaling, and Cluster Analysis.
 

Univariate and Bivariate Data Analysis
  A B C D E F G H I
1 Week Meat sold in Kilogram Forecast in feet
Deviation From Mean
=ai
Squared Error
Cubed Deviation  Deviation From Mean
=ei
Squared Error ei2 ai (ei)
 
 
2 41 3290.8 3119.2 -1019.73 1039849.27 -1060365499.05 -1173.93 1378111.64 1197091.64    
3 42 3536.7 3407.1 -773.83 598812.87 -463379362.34 -886.03 785049.161 685636.59    
4 43 3902 4073 -408.53 166896.76 -68182333.73 -220.13 48457.2169 89929.71    
5 44 4145 4378.9 -165.53 27400.18 -4535551.94 85.77 7356.4929 -14197.51    
6 45 4205 4498.9 -105.53 11136.58 -1175243.38 205.77 42341.2929 -21714.91    
7 46 4387.9 4199 77.37 5986.12 463145.86 -94.13 8860.4569 -7282.84    
8 47 4687.8 4498.9 377.27 142332.65 53697839.96 205.77 42341.2929 77630.85    
9 48 4768.8 4498.9 458.27 210011.39 96241921.02 205.77 42341.2929 94298.22    
10 49 4936.2 4798.8 625.67 391462.95 244926623.24 505.67 255702.149 316382.55    
11 50 5245.1 5458.6 934.57 873421.08 816273143.31 1165.47 1358320.32 1089213.30    
  43105.3 42931.3 0 3467309.86 -386035317.05 0 3968881.32 3506987.60    

Table 1.1  Statistical Computation for Univariate Data

Go To Top

Table 1.1 above contains both univariate and bivariate data set with values in kilogram, and below are their commonly used descriptive statistics.

Mean, , of volume sold for the past 10 weeks


=  
 


    = 43105.3/10 = 4,310.53 kg

 
Median (or the middle value)

=  


=5.5, which is value at the 5.5th position  =(4205+4387.9)/2   =4296.45    or, =MEDIAN(B2:B11)
 

Range

=   Max - Min   =MAX(B2:B11)-MIN(B2:B11)   =1954.3 kg
The range is very sensitive to extreme scores since it is based on only two values. It should almost never be used as the only measure of spread or dispersion, but can be informative if used as a supplement to other measures of spread such as the standard deviation or semi-interquartile range.
 
Deviation from the Mean

Xi -   

 Note: sum of the deviations always equal to zero, so they should be squared.  
Mean Absolute Deviations (MAD)

MAD =  


=SUMPRODUCT(ABS(D2:D11))/COUNT(A2:A11)   =494.63 kg
 
Sum of Squared Deviations (SSD)

SSD =  

  
  =3,467,309.86 kg
 
Mean Squared Deviation (MSD)

MSD =  


   =346,730.98 kg.   It is also called the Maximum Likelihood Estimate (MLE).
 
Variance, var (x) or S2, is the sum of squared deviations divided by the degrees of freedom

S2 =  


   =SUM(E2:E11)/(COUNT(A2:A11)-1)   =385,256.65 kg
 
The degrees of freedom (df) can be defined as the number of data points minus the number of parameters estimated (which is 1 in the data kilograms)

S2 is closely related to MSD, and because it has a smaller denominator, its value is always larger than MSD. Variance
S2 is an unbiased estimator of this sample sales variance, whereas MSD is a biased estimator of the sample data variance. (If the mean value of an estimator equals the true value of the quantity it estimates, the estimator is called an unbiased estimator. An estimator is a biased estimator if its expected value is not equal to the value of the population parameter being estimated).

The variance, standard deviation, standard error, as well as the range, all are measuring variability.

Go To Top

Covariance, Covxy cov(x,F) =
  =3506987.60 / 9   =389665.29 kg-feet
 
Covariance is the measure of how much two variables change together. The covariance becomes more positive for each pair of values which differ from their mean in the same direction. The covariance becomes more negative with each pair of values which differ from their mean in opposite directions (in another word, if one of them tends to be above its expected value when the other variable is below its expected value. (Variance is a special case of the Covariance when the two variables are identical).

Note that the units of covariance, kg-feet, is difficult to interpret. Hence we want to compute the correlation coefficient from Pearson's r which takes care of this scaling problem as described below. If Covariance is divided by the two standard deviations, then the units in the numerator and denominator cancel, leaving with dimensionless number, which will be Pearson's r, the Correlation Coefficient as shown below. Dividing the covariance by the two standard deviations restricts the range r to the interval -1 to +1.

       

 


Pearson's product-moment Correlation Coefficient
("ρ" when it is measured in the population and " r " when it is measured in a sample) is a measure of the strength of the linear relationship between two variables. If the relationship between the variables is not linear, then the correlation coefficient does not adequately represent the strength of the relationship between the variables. Simply put it, Correlation Coefficient is the covariance of the deviations of the variables from their means, expressed in terms of their respective standard deviations.

Pearson's r can range from -1 to 1. An r of -1 indicates a perfect negative linear relationship between variables, an r of 0 indicates no linear relationship between variables, and an r of 1 indicates a perfect positive relationship between variables. With real data, you would not expect to get values of r of exactly -1, 0, or -1.

Pearson's correlation is symmetric in the sense that the correlation of X with Y is the same as the correlation of Y with X. A critical property of Pearson's r is that it is unaffected by linear transformations - which means that multiplying a variable by a constant and/or adding a constant does not change the correlation of that variable with other variables. If Y is the transformed value of X, then Y = mX + b.

A perfect linear relationship, Pearson's r = 1 A perfect negative linear relationship, Pearson's r = -1
Note that as X increases, Y decreases.
Scatter plot shows that there is no linear relationship between the variables, Pearson's r = 0.
  Pearson's r formula is designed so that the correlation between kilogram and feet (in Table 1.1) is the same whether ’forecast’ is measured in kilogram or in feet.

The value of Pearson's correlation coefficient computed from Table 1.1, is 0.945. So there is a substantial correlation of 0.945 between kilogram and feet, which means there is a strong positive association between the two different set of variables. Hence, the scatter plot chart on the left reveals a strong positive linear relationship between the two variables set.

Covariance and Correlation are closely related statistics to each other in bivariate and especially multivariate data sets. Care must be taken that they are only for measures of linear association and not appropriate for a curvilinear relationship. 

 

    Figure 1.0  Scatter Plot Sales and Forecast

 

Go To Top

 

Pearson's Correlation Coefficient, r

r
   
 
 
  =SUMPRODUCT((D2:D11),(G2:G11))/SQRT(E12*H12)  
  =3506987.6 / 3709628.2   =0.945
 
Root Mean Square (RMS)

RMS =  

 
 =SQRT(SUM(E2:E11)/COUNT(A2:A11))  =588.84 kg
 

Standard Deviation (SD)
denoted by
σ


SD =  

 
=SQRT(SUM(E2:E11)/(COUNT(A2:A11)-1))   =620.69 kg.
 
Standard Deviation is a measure of the variability or dispersion of a population or a data set. A low standard deviation indicates that the data points tend to be very close to the the mean, while high standard deviation means the data are spread out over a large range of values. It is also commonly used to measure confidence in statistical conclusions  
Standard Error of the Mean (SEM)


=  
 

  
  =STDEV(B2:B11)/SQRT(COUNT(B2:B11))     or,  STDEV(B2:B11)/(10^(1/2))  =196.280
 
 
SEM is the standard deviation of the error in the sample mean relative to the true mean, since the sample mean is an "unbiased estimator". It is also the standard deviation of the sample mean estimate of a population mean. The standard error of the mean is a "biased estimator" of the population standard error.

Coefficient of Variation (CV)

 

 

CVX =  


CV
F=  

  

  
 =(620.69/4310.53)*100   =14.399


  =( 664.07/4293.13)   =15.468

=SQRT(SUM(H2:H11)/(COUNT(A2:A11)-1))   =664.07 kg
=SUM(C2:C11)/COUNT(A2:A11)   =4293.13 kg

 
This coefficient provides a unitless measure of the variation of the distribution, by translating it into a percentage of the mean value. This CV value can be used when comparing two parameter samples that have different means and standard deviations. When the mean is close to 0, the CV value becomes of little use. 

Skewness


Skew =  

  
  = SUMPRODUCT((D2:D11)^3)/
(COUNT(D2:D11)-1)*(SQRT(SUM(E2:E11)/(COUNT(E2:E11)-1))^3)
 
  = (-386035317.05) /
9*(620.69)3     = −0.179

Excel, however, uses a slightly different equation for skewness, as shown below. Using Excel's Data Analysis Tool, I obtain the value 0.224. The two results are pretty close.

 

On the right, the positively skewed distribution curve rises rapidly, reaches the maximum and falls slowly; and the tail as well as median on the right-hand side. A negatively skewed distribution curve rises slowly, reaches its maximum and falls rapidly, and the tail as well as the median are on the left-hand side.
   
Standard Error of Skewness

=  

  
 =SQRT(6/10)  =0.775
 
Measures using Skewness and Kurtosis help to decide normality. Skewness measure indicates the level of non-symmetry. If the distribution of the data are symmetric, then skewness will be close to zero. How can we tell if the skewness is large enough to be a concern? This can be checked using a measure of the standard error of skewness. If the skewness is more than twice this amount, in this case 1.549, then it indicates that the distribution of the data is non-symmetric. However, this does not indicate that the sales data are normally distributed. If Skew is >0, the distribution has a pronounced right tail;  whereas if Skew is <0, then the distribution will have a left tail. The sales data distribution shows a negative skewness because the negative deviations dominate the positive deviations, when cubed.

 Go To Top



Kurtosis or Kurt(X)



Kurt(X) =


  this the Kurtosis formula,
 but Excel is using a different formula as shown below.



    =
KURT(B2:B11)    = −0.738
 
Kurtosis is a measure of the peakedness and flatness of the data distribution. Again, for normally distributed data, the kurtosis is zero. Heavier tailed distributions have larger kurtosis measures. The normal distribution has a kurtosis of 0 irrespective of its mean or standard deviation. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. As with skewness, if the value of kurtosis is too big or too small, there is concern about the normality of the distribution. In this case, a rough formula for the Standard Error of Kurtosis is =SQRT(24/N) = 0.1.549. Twice this amount is 3.099.
 
L  = Lipto  Kurtic  (Kurtosis > 0)
M = Meso kurtic  (Kurtosis = 0) or a normal distribution
P =  Platy kurtic  (Kurtosis < 0)

Kurtosis has its origin in the Greek word "Bulginess." Kurtosis is measured relative to the ’peakedness’ or ’flatness’ of the normal curve. It tells us the extent to which a distribution is more peaked or flat-topped than the normal curve.
 

 Go To Top


Least-Squares
Fitting Technique

Least-Squares Method is used to find the best-fitting curve to a given set of data, x1...xn by minimizing the sum of the squares of the offsets ("the residuals", also called the Sum of Squared Errors or Deviations, SSD) of the points from the curve. Because the sum of the squares of the offsets is used instead of the offset absolute values, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand. Least-Squares Method can be linear and non-linear, and categorized further as ordinary least squares (OLS), weighted least squares (WLS), and alternating least squares (ALS). Ordinary Linear Least Squares (LLS) fitting technique is the simplest and most commonly applied form of linear regression and provides a solution to the problem of finding the best fitting straight line through a set of points.

Although Excel can plot a trend line instantly for your graphs, but let's use this LLS method as an example to understand the logic and calculation of the coefficients and a straight line that best fits the historical data in Table 1.2 .The equation for the line is:     y = mx + b
where the dependent y-value is a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors.

Let X be the number of hours that employees clocked in for work in a typical month, Y be the total units of production output recorded, and there were 11  employees' sample data collected for analysis.

 Go To Top
  A B C D E F G    
1   Monthly Hrs Worked, Xi Monthly Output, Yi Deviation from Mean
Deviation from Mean
 

Figure 1.1  Using Linear Least Squares Method to find the best line of Fit’ from the example on the left.

2 195 21816 -55.09 -1455.55 80187.32 3035.01  
3 279 26893 28.91 3621.45 104692.96 835.74  
4 224 20528 -26.09 -2743.55 71581.60 680.74  
5 237 20665 -13.09 -2606.55 34122.05 171.37  
6 219 21302 -31.09 -1969.55 61234.96 966.64  
7 252 24420 1.91 1148.45 2192.50 3.64  
8 256 22315 5.91 -956.55 -5652.31 34.92  
9 264 19595 13.91 -3676.55 -51137.40 193.46  
10 254 25294 3.91 2022.45 7905.96 15.28  
11 285 25989 34.91 2717.45 94863.87 1218.64  
12 286 27170 35.91 3898.45 139989.96 1289.46  
13 Total 2751 255987 0 0 539981.45 8444.91  
14 Mean 250.09 23271.55 0 0 49089.22    

Table 1.2  Ordinary Least Square Fitting Method to find the best linear Line fit on the bivariate data

 

Mean of total hours worked
Mean of the monthly output
=  
=  
SUM(B2:B12)/COUNT($B$2:$B$12)   =250.09
SUM(C2:C12)/COUNT($B$2:$B$12)   =23271.55
 

The slope m


The intercept, b

Linear Least Squares line LLS


m
=  


b
=  

f (X) = 

  
  =539981.45 / 8444.91  =63.942      or, use this  =INDEX(LINEST(C$2:C$12,B$2:B$12),1)


- m ()    =23271.55 - 63.942(250.09)      =7280.3      or, use this  =INDEX(LINEST(C$2:C$12,B$2:B$12),2)

mx + b          =63.942x + 7280.3

 
       

Table 1.3   Single Time Series lagged On Itself To Determine The Auto-Covariance and Auto-Correlation                                      

  A B C D E F G H I J K L M
1 t (k=1) (k=2) (k=3)

(k=1)

(k=2) (k=3)


2 1 101       34.05       1159.40      
3 2 40 101     -26.95 34.05     726.30 -917.65    
4 3 36 40 101   -30.95 -26.95 34.05   957.90 834.10 -1053.85  
5 4 116 36 40 101 49.05 -30.95 -26.95 34.05 2405.90 -1518.10 -1321.90 1670.15
6 5 54 116 36 40 -12.95 49.05 -30.95 -26.95 167.70 -635.20 400.80 349.00
7 6 52 54 116 36 -14.95 -12.95 49.05 -30.95 223.50 193.60 -733.30 462.70
8 7 93 52 54 116 26.05 -14.95 -12.95 49.05 678.60 -389.45 -337.35 1277.75
9 8 33 93 52 54 -33.95 26.05 -14.95 -12.95 1152.60 -884.40 507.55 439.65
10 9 77 33 93 52 10.05 -33.95 26.05 -14.95 101.00 -341.20 261.80 -150.25
11 10 64 77 33 93 -2.95 10.05 -33.95 26.05 8.70 -29.65 100.15 -76.85
12 11 43 64 77 33 -23.95 -2.95 10.05 -33.95 573.60 70.65 -240.70 813.10
13 12 59 43 64 77 -7.95 -23.95 -2.95 10.05 63.20 190.40 23.45 -79.90
14 13 27 59 43 64 -39.95 -7.95 -23.95 -2.95 1596.00 317.60 956.80 117.85
15 14 94 27 59 43 -27.05 -39.95 -7.95 -23.95 731.70 -1080.65 -215.05 -647.85
16 15 113 94 27 59 46.05 -27.05 -39.95 -7.95 2120.60 1245.65 -1839.70 -366.10
17 16 109 113 94 27 42.05 46.05 -27.05 -39.95 1768.20 1936.40 1137.45 -1679.90
18 17 69 109 113 94 2.05 42.05 46.05 -27.05 4.20 86.20 94.40 55.45
19 18 39 69 109 113 -27.95 2.05 42.05 46.05 781.20 -57.30 -1175.30 -1287.10
20 19 78 39 69 109 11.05 -27.95 2.05 42.05 122.10 -308.85 22.65 464.65
21 20 42 78 39 69 -24.95 11.05 -27.95 2.05 622.50 -275.70 697.35 -51.15

sum =

1339               15964.95 -1563.50 -2714.71 1311.24

mean =

66.95                      
Auto-Cov  (lag 1) = -86.86 Auto-r  (lag 1) = - 0.10          
Auto-Cov (lag 2) = - 159.69 Auto-r  (lag 2) = - 0.17          
Auto-Cov (lag 3) = 81.95 Auto-r  (lag 1) = 0.08          

Table 1.3 above is a time series of 20 "equally spaced" data points, denoted by t time period. Each individual data element is referred to as Xt. The mean for the series is x-bar. The data is generated randomly. (hint: you can use this formula to generate a series of random numbers, =INT((85-5+1)* RAND()+5); for this case, I used max=85 and min=5). First, I copy a duplicate of this complete time series to column C but offset by a period of one (time lag k=1). Next I do it similarly in column D and E, lagging the time series by a period each until k=3. Column F is the deviation from the mean. Column G, H, I are basically lagging the deviation of the mean in column F by a lag of one period each. Column J is deviation of the mean squared. Column K is multiplying column F by column G, lagging by one period; column L is column F times column H for 2 periods lagging, and column M is column F times column I for a 3 periods lagging. The calculation of autocovariance and autocorrelation, with one-period lag, is using the following equation:

Auto-Covariance (lag k) auto-covk =
   = -1563.50/(20-1-1)   = - 86.86
Auto-Correlation (lag k) auto-rk =    


    =
-1563.50 / 15964.95   = - 0.10
Using the same equations, you will obtain the result for autocovariance and autocorrelation, for the lag 2, 3 and more periods. The three autocorrelation values (-0.1, -0.17, 0.08) are close to zero, and so we know that there is no clear pattern of linear relationship.

Given a real stochastic process, the autocovariance will be the covariance of the "signal" against a time-shifted version of itself. For example, I want to know "to what extent will the next measurement depend on the data point that I have just examined, or the one that was just before". Auto-covariance would be a measure of this dependence as shown in Table 1.3, with time period offsets. Auto-correlation is simply defined as a normalized form of the auto-covariance. Autocorrelation is the correlation of a data set with itself, offset by k-values. For example, autocorrelation with an offset of 5 would correlate the data set {X1, X2 ... Xn-5} is correlated with {X5, X6 ... Xn}.The autocorrelation function is the set of autocorrelations with offsets 1, 2, 3, 4 ... limit, where limit is <= n/2.

Go To Top

  Table 1.4   Computations of Standard and Relative Statistics on Forecast Errors
A B C D E F G H I J
1

Period
t

Actual Data

Forecast
 

Forecast
Error

Absolute
Error

 

Squared
Error

 

PE

APE

U's numerator

U's denominator

2 1 24 28 -4 4 16 -16.67 16.67 0.085 0.016
3 2 21 28 -7 7 49 -33.33 33.33 0.145 0.510
4 3 36 28 8 8 64 22.22 22.22 0.007 0.007
5 4 33 30 3 3 9 9.09 9.09 0.001 0.008
6 5 36 35 1 1 1 2.78 2.78 0.003 0.012
7 6 40 38 2 2 4 5.00 5.00 0.001 0.010
8 7 44 45 -1 1 1 -2.27 2.27 0.002 0.008
9 8 48 50 -2 2 4 -4.17 4.17 0.002 0.016
10 9 54 52 2 2 4 3.70 3.70 0.005 0.001
11 10 56 52 4 4 16 7.14 7.14    
Sum       6 34 168 -6.50 106.38 0.251 0.589
ME 0.60 SDE 4.32
MAE 3.40     SPE -6.50          
SSE 168     MPE -0.65          
MSE 16.8     MAPE 10.63          
                   
Theil's U 0.65                  
Standard Statistical Measures

Go To Top


Mean Error, with n error terms.

ME
=  
 
  =6/10  =0.6


A low value of the ME may conceal forecasting inaccuracy due to the offsetting effect of large positive and negative forecast errors. But 0.6 in this case is a fair value, given that there are no large typically forecast deviations in the series. However, its inaccuracy will become more apparent from inspection of subsequent forecast evaluation statistics.

Mean Absolute Error

MAE =  
 
    =34/10  =3.4


3.4 is a small marginal forecast error, and forecasting has done a good justice. Even though, the MSE and MAE may overcome the 'cancellation of positive and negative errors' limitation of the ME, but in some cases they may fail to provide information on forecasting accuracy relative to the scale of the series being examined, for example when you have multivariate forecast series, F1 , F2  , F3, and more.
Sum of Squared Errors SSE =    
    =168

Mean Squared Error

MSE =  
 
=168/10  =16.8


Because MSE squares the errors, and thereby give opportunities to larger errors than does the MAE. This is especially true for the large squared errors (49 and 62) derived from period 2 and 3 respectively, as compared to the sum of squared errors from period 5 through to period 9 (total 14), which are even much smaller. MSE of 16.8 is considered relatively large, and U-statistic can be used to making a comparison.
Standard Deviation Errors SDE =    
      =SQRT(168/9)   =4.32
Relative Statistical Measures

Go To Top

Percentage Error PE =    
        and so SPE, sum of PE = - 6.5
Mean Percentage Error MPE =    
    = - 6.5/10     = - 0.65
Mean Absolute Percentage Error MAPE =     
   =106.38/10  = 10.64 
The objective of using standard statistics and relative statistical measures is often to look for an optimization model so as to minimize the sum or mean of squared errors (MSE, SSE). This may not be a good measure for several reasons. First, the relative measures are giving equal weight to all the time series errors as opposed to MSE, which squares the errors and thereby only emphasizing large errors. Even then, the intention of minimizing MSE to 0 in the fitting phase can always be achieved by using a polynomial of sufficiently high order or an appropriate Fourier transformation. Over-fitting a model to a data series is as good as intentionally including randomness in the generating process, or failing to identify the nonrandom data pattern. Second, because other forecasting methods use different procedures in the fitting phrase -- for example, Smoothing methods are highly dependent on the initial forecasting estimates; Decomposition methods include the trend-cycle in the fitting phase;  Regression methods minimize the MSE by giving equal weight to all the collection data points; Box-Jenkins methods minimize the MSE of a non-linear optimization procedure. So, using MSE or SSE as a single criterion alone is rarely adequate. Also, in the forecasting process, using MSE as a measure of statistics accuracy can also create problems. Because MSE is an absolute measure, it does not facilitate comparison across multivariate time series and for different time intervals.
Theil's U-Statistic  

Go To Top

U-Statistic U =     


 = SQRT(0.251/0.598)  =0.65
    ,which says that naive method can do an equally good forecasting than other formal methods.

U-Statistical measure, although consider as a 'naive' approach to formal forecasting, does in the evaluation process demonstrate a good characteristics of giving more weight to the large errors (which MSE squared to create the disproportionate margins of large errors.) than to the small errors. Theil's U measures how well the forecasting model predicts against a ‘naive’ model. A forecast in a naive model is done by repeating the most recent value of the variable as the next forecasted value. The intervals of the U-statistic can be examined as:

U = 1: The naive estimation approach is as good as the forecasting technique.

U near to 0 : The more accurate the forecasts, the lower the value of the U1 statistic. The U1 statistic is bounded between 0 and 1, with values closer to 0 indicating greater forecasting accuracy. If it is equal to 0 then the forecasting model is a perfect fit.

U < 1: The forecasting technique used demonstrated greater accuracy than the naive method.

U > 1: there is no point in using a forecasting method, since a naive method in this case, produce better results.

In a way, U-statistic provides a good comparison between the formal forecasting methods and the naive forecasting methods.

Demand Forecast Accuracy

Go To Top

Demand Forecast Error is the deviation of the actual realized demand quantity from the forecasted quantity. The Forecast Error can be bigger than Actual or Forecast but NOT both. Error above 100% gets 0% forecast accuracy or a very inaccurate forecast.

Error % = | Actual Demand - Forecast |
                   Actual Demand

Forecast Accuracy % = 1 - Error %

This site was created in February 2007.
contact Tan, William     email: vbautomation@yahoo.com


free counters