Descriptive
Statistics Analysis
Quantitative forecasts are based on
data, or observations. A single observation or actual value can be represented by a
variable, Xi ,for example, actual sales $ booked. The
objective of forecasting then is to predict the future value of X.
Individual forecasts can be denoted by variable Fi
,and the error
denoted by ei which is the difference between actual and forecast
value for observation i, for example:
ei
= Xi
-
Fi
. In both time-series and casual forecasting, the variable time
interval t denotes the present time period, t - 1 last
period, t - 2 two periods ago, and so on. Forecasts are for
time periods t + 1, t + 2 ... t + m. For the rest of these sections, I will
use the following notations.
|
Observed |
Historical |
Present |
Forecasted |
Values |
X1, X2
... Xn |
Xt
- n ... Xt - 2, Xt
- 1 |
Xt |
Ft
+ 1 Ft + 2 ... Ft
+ m |
Period i |
1, 2 ... 4 |
t - m
... t -
2, t -
1 |
t |
t
+ 1,
t + 2
... t + m |
Estimated values |
F1,
F2 ... Fn |
F
- n ... F
- 2, F
- 1 |
Ft |
|
Error |
e1,
e2 ... en |
et - n
...et - 2, et - 1 |
et |
|
Univariate
data analysis explores each variable in a data set, separately. It
looks at the range of values, as well as the central tendency of the
values. It describes the pattern of response to the variable. It
describes each variable on its own. Descriptive statistics describe and
summarize data. Both Univariate and Bivariate descriptive statistics
describe individual variables for the central tendency, variability or
dispersion, and shape of the overall distribution.
Bivariate descriptive statistics (e.g., ANOVA,
t-tests) often involve more than one quantitative variable being
collected and examined. For example, data collected from customers
contains kg, feet, locations, dollars spend/week, Return Memo/week.
summarizing such data in a way that is analogous to summarizing
univariate (single variable) data.
Multivariate
descriptive statistics refers to any statistical technique used to
analyze data that arises from more than one variable at a time.
Multivariate data analysis is about separating the signal from the noise
in data with many variables and presenting the results as easily
interpretable plots summarizing the essential information. This
essentially models reality where each situation, product, or decision
involves more than a single variable. A wide range of methods is used
for the analysis of multivariate data such as Regression Analysis,
Correspondence Analysis, Factor Analysis, Multidimensional Scaling, and
Cluster Analysis.
Go To Top
Table 1.1 above contains both univariate and bivariate data set
with values in
kilogram, and below are their commonly used descriptive statistics.
Mean,
,
of volume sold for the past 10 weeks |
=
|

= 43105.3/10 = 4,310.53 kg |
|
Median
(or the middle value) |
= |

=5.5, which is value at the 5.5th position
=(4205+4387.9)/2 =4296.45 or,
=MEDIAN(B2:B11) |
|
Range |
=
|
Max - Min =MAX(B2:B11)-MIN(B2:B11)
=1954.3 kg
The range is very sensitive to extreme scores since it is based
on only two values. It should almost never be used as the only
measure of spread or dispersion, but can be informative if used
as a supplement to other measures of spread such as the standard
deviation or semi-interquartile range. |
|
Deviation from the
Mean |
Xi -
|
Note:
sum of the deviations always equal to zero, so they should be
squared. |
|
Mean Absolute Deviations
(MAD) |
MAD =
|

=SUMPRODUCT(ABS(D2:D11))/COUNT(A2:A11) =494.63 kg |
|
Sum of
Squared Deviations
(SSD) |
SSD =
|
=3,467,309.86 kg |
|
Mean Squared Deviation
(MSD) |
MSD =
|

=346,730.98 kg. It is also called the Maximum
Likelihood Estimate (MLE). |
|
Variance, var (x) or
S2,
is the sum of squared deviations divided by the
degrees of freedom |
S2 =
|

=SUM(E2:E11)/(COUNT(A2:A11)-1) =385,256.65 kg |
|
The degrees of freedom (df)
can be defined as the number of data points minus the number of
parameters estimated (which is 1 in the data kilograms)
S2 is closely related to MSD, and because it
has a smaller denominator, its value is always larger than MSD.
Variance
S2
is an unbiased estimator of this sample sales variance,
whereas MSD is a biased estimator of the sample data
variance. (If the mean value of an estimator equals the true
value of the quantity it estimates, the estimator is called an
unbiased estimator. An
estimator is a biased estimator
if its expected value is not equal to the value of the
population parameter being estimated).
The variance, standard deviation,
standard error, as well as the range, all are measuring
variability.
Go To Top |
Covariance,
Covxy |
cov(x,F)
= |
=3506987.60 / 9 =389665.29 kg-feet |
|
Covariance is the measure of how
much two variables change together. The covariance becomes more
positive for each pair of values which differ from their mean in
the same direction. The covariance becomes more negative with
each pair of values which differ from their mean in opposite
directions (in another word, if one of them tends to be above
its expected value when the other variable is below its
expected value. (Variance is a special case of the
Covariance when the two variables are identical).
Note that the units of covariance,
kg-feet, is difficult to interpret. Hence
we want to compute the correlation coefficient from Pearson's r
which takes care of this scaling problem as described below. If
Covariance is divided by the two standard deviations, then the
units in the numerator and denominator cancel, leaving with
dimensionless number, which will be Pearson's r, the Correlation
Coefficient as shown below. Dividing the covariance by the two
standard deviations restricts the range r to the interval -1
to +1.
 |
|
Pearson's product-moment
Correlation Coefficient ("ρ"
when it is measured in the population and " r " when it is measured
in a sample) is a measure of the strength of the linear relationship
between two variables. If the relationship between the variables is not
linear, then the correlation coefficient does not adequately represent
the strength of the relationship between the variables. Simply put it,
Correlation Coefficient is the covariance of the deviations of the
variables from their means, expressed in terms of their respective
standard deviations.
Pearson's r can range from -1 to 1. An r of -1 indicates a perfect
negative linear relationship between variables, an r of 0 indicates no
linear relationship between variables, and an r of 1 indicates a perfect
positive relationship between variables. With real data, you would not
expect to get values of r of exactly -1, 0, or -1.
Pearson's correlation is symmetric in the
sense that the correlation of X with Y is the same as the correlation of
Y with X. A critical property of Pearson's r is that it is unaffected by
linear transformations - which means
that multiplying a variable by a constant and/or adding a constant does
not change the correlation of that variable with other variables. If Y
is the transformed value of X, then Y = mX + b.
 |
 |
 |
A perfect linear relationship,
Pearson's r = 1 |
A perfect negative linear
relationship, Pearson's r = -1
Note that as X increases, Y decreases. |
Scatter plot shows that there is no linear relationship between
the variables, Pearson's r = 0. |
|
 |
Pearson's r formula is designed so that the correlation between
kilogram and feet (in Table 1.1) is the same whether ’forecast’
is measured in kilogram or in feet.
The value of Pearson's correlation
coefficient computed from Table 1.1, is 0.945.
So there is a substantial correlation of 0.945 between kilogram
and feet, which means there is a strong positive association
between the two different set of variables. Hence, the scatter
plot chart on the left
reveals a strong positive linear relationship between the two variables
set.
Covariance and Correlation are
closely related statistics to each other in bivariate and
especially multivariate data sets. Care must be taken that they
are only for measures of linear association and not appropriate
for a curvilinear relationship. |
|
Figure 1.0
Scatter Plot Sales and Forecast |
|
Go To Top |
|
Pearson's Correlation Coefficient,
r |
r
|
=SUMPRODUCT((D2:D11),(G2:G11))/SQRT(E12*H12)
=3506987.6 /
3709628.2 =0.945 |
|
Root Mean Square
(RMS) |
RMS = |
=SQRT(SUM(E2:E11)/COUNT(A2:A11)) =588.84 kg |
|
Standard Deviation
(SD)
denoted by
σ |
SD =
|
=SQRT(SUM(E2:E11)/(COUNT(A2:A11)-1)) =620.69 kg. |
|
Standard Deviation is a measure of
the variability or dispersion of a population or a data set. A
low standard deviation indicates that the data points tend to be
very close to the the mean, while high standard deviation means
the data are spread out over a large range of values. It is also
commonly used to measure confidence in statistical conclusions |
|
Standard
Error of the Mean (SEM) |
=
|
=STDEV(B2:B11)/SQRT(COUNT(B2:B11)) or,
STDEV(B2:B11)/(10^(1/2)) =196.280
|
|
SEM is the standard deviation of the
error in the sample mean relative to the true mean, since the
sample mean is an "unbiased estimator". It is also the standard
deviation of the sample mean estimate of a population mean. The
standard error of the mean is a "biased estimator" of the
population standard error. |
Coefficient of Variation
(CV)
|
CVX
=
CVF=
|
=(620.69/4310.53)*100 =14.399

=( 664.07/4293.13) =15.468
=SQRT(SUM(H2:H11)/(COUNT(A2:A11)-1)) =664.07 kg
=SUM(C2:C11)/COUNT(A2:A11) =4293.13 kg |
|
This coefficient provides a unitless measure of
the variation of the distribution, by translating it into a
percentage of the mean value. This CV value can be used when
comparing two parameter samples that have different means and
standard deviations. When the mean is close to 0, the CV value
becomes of little use. |
Skewness |
Skew =
|
= SUMPRODUCT((D2:D11)^3)/(COUNT(D2:D11)-1)*(SQRT(SUM(E2:E11)/(COUNT(E2:E11)-1))^3)
= (-386035317.05) / 9*(620.69)3 = −0.179
Excel, however, uses a slightly different
equation for skewness, as shown below. Using Excel's Data
Analysis Tool, I obtain the value
−0.224. The two results are pretty close.
 |
|
On the right, the
positively skewed distribution curve rises rapidly, reaches the
maximum and falls slowly; and the tail as well as median on the
right-hand side. A negatively skewed distribution curve rises
slowly, reaches its maximum and falls rapidly, and the tail as
well as the median are on the left-hand side. |
 |
|
Standard Error of Skewness |
=
|
=SQRT(6/10) =0.775 |
|
Measures
using Skewness and Kurtosis
help to decide normality. Skewness measure indicates the level
of non-symmetry. If the distribution of the data are symmetric,
then skewness will be close to zero. How can we tell if the skewness is large enough to be a concern? This can be checked
using a measure of the standard error of skewness. If the
skewness is more than twice this amount, in this case 1.549,
then it indicates that the distribution of the data is
non-symmetric. However,
this does not indicate that the sales data are normally distributed.
If Skew is >0, the distribution has a pronounced
right tail; whereas if Skew is <0, then the distribution
will have a left tail.
The sales data distribution shows a negative skewness because the negative deviations dominate the
positive deviations, when cubed.
Go To Top |
Kurtosis or Kurt(X) |
Kurt(X) =
|
this the Kurtosis formula, but
Excel is using a different formula as shown below.

=KURT(B2:B11)
= −0.738 |
|
Kurtosis
is a measure of the peakedness and flatness of the data
distribution. Again, for normally distributed data, the kurtosis
is zero. Heavier tailed distributions have larger kurtosis
measures. The normal distribution has a kurtosis of 0
irrespective of its mean or standard deviation. Positive
kurtosis indicates a relatively peaked distribution. Negative
kurtosis indicates a relatively flat distribution. As with skewness, if the value of kurtosis is too big or too small,
there is concern about the normality of the distribution. In
this case, a rough formula for the
Standard Error of Kurtosis is =SQRT(24/N) = 0.1.549.
Twice this amount is 3.099.
 |
L
= Lipto Kurtic (Kurtosis > 0)
M = Meso kurtic (Kurtosis = 0) or a normal distribution
P = Platy kurtic (Kurtosis < 0)
Kurtosis has its origin in the Greek word
"Bulginess." Kurtosis is measured relative to the ’peakedness’
or ’flatness’ of the normal curve. It tells us the extent to
which a distribution is more peaked or flat-topped than the
normal curve.
|
Go
To Top |
Least-Squares Fitting Technique
Least-Squares Method is used to find the
best-fitting curve to a given set of data,
x1...xn by
minimizing the sum of the squares of the offsets ("the
residuals", also called the Sum of Squared Errors or
Deviations, SSD) of the
points from the curve. Because the sum of the squares of
the offsets is used instead of the offset absolute values,
outlying points can have a disproportionate effect on the fit, a
property which may or may not be desirable depending on the
problem at hand. Least-Squares Method can be linear and
non-linear, and categorized further as ordinary least squares (OLS),
weighted least squares (WLS), and alternating least
squares (ALS). Ordinary Linear Least Squares (LLS)
fitting technique is the simplest and most commonly applied form
of linear regression and provides a solution to the problem of
finding the best fitting straight line through a set of points.
Although Excel can plot a trend line instantly for your
graphs, but let's use this LLS method
as an example to understand the logic and calculation of the
coefficients and a
straight line that best fits the historical data in Table 1.2 .The equation for the
line is: y = mx + b
where the dependent y-value is a function of the independent
x-values. The m-values are coefficients corresponding to each x-value,
and b is a constant value. Note that y, x, and m can be vectors.
Let X be the number of hours that employees clocked in
for work in a typical month, Y be the total units of
production output recorded, and there were 11 employees'
sample data collected for analysis.
Go To Top
Table 1.2 Ordinary Least Square Fitting Method to find
the best linear Line fit on the bivariate data |
Mean of total hours worked
Mean of the monthly output |
=
= |
SUM(B2:B12)/COUNT($B$2:$B$12)
=250.09
SUM(C2:C12)/COUNT($B$2:$B$12) =23271.55 |
|
The
slope m
The intercept, b
Linear Least Squares line LLS
|
m =
b =
f (X)
=
|
=539981.45 / 8444.91 =63.942 or, use
this
=INDEX(LINEST(C$2:C$12,B$2:B$12),1)
-
m
()
=23271.55 - 63.942(250.09)
=7280.3 or, use this
=INDEX(LINEST(C$2:C$12,B$2:B$12),2)
mx +
b
=63.942x + 7280.3 |
|
|
|
|
|
Table 1.3 Single Time Series lagged
On Itself To Determine The Auto-Covariance
and Auto-Correlation
Table 1.3 above is a
time series of 20 "equally spaced" data points, denoted by t
time period. Each individual data element is referred to as Xt.
The mean for the series is x-bar. The data is generated randomly.
(hint: you can use this formula to generate a series of random
numbers, =INT((85-5+1)* RAND()+5); for
this case, I used max=85 and min=5). First, I copy a duplicate of
this complete time series to column C but offset by a period of one
(time lag k=1). Next I do it similarly in column D and E,
lagging the time series by a period each until k=3. Column F
is the deviation from the mean. Column G, H, I are basically lagging
the deviation of the mean in column F by a lag of one period each.
Column J is deviation of the mean squared. Column K is multiplying
column F by column G, lagging by one period; column L is column F
times column H for 2 periods lagging, and column M is column F times
column I for a 3 periods lagging. The calculation of autocovariance
and autocorrelation, with one-period lag, is using the following
equation: |
Auto-Covariance (lag k) |
auto-covk
= |
= -1563.50/(20-1-1) = - 86.86 |
Auto-Correlation (lag k) |
auto-rk
= |
=
-1563.50 / 15964.95 = - 0.10 |
Using the same equations, you will
obtain the result for
autocovariance and autocorrelation, for the lag 2, 3 and more
periods. The three autocorrelation values (-0.1, -0.17, 0.08)
are close to zero, and so we know that there is no clear pattern
of linear relationship.
Given a real stochastic process, the autocovariance
will be the covariance of the "signal" against a time-shifted
version of itself. For example, I want to know "to what
extent will the next measurement depend on the data point that I
have just examined, or the one that was just before".
Auto-covariance would be a measure of this dependence as shown
in Table 1.3, with time period offsets.
Auto-correlation is simply defined as a normalized form of the
auto-covariance. Autocorrelation is the correlation of a data
set with itself, offset by k-values. For example,
autocorrelation with an offset of 5 would correlate the data set
{X1,
X2
... Xn-5}
is correlated with {X5,
X6
... Xn}.The autocorrelation function is the set of autocorrelations
with offsets 1, 2, 3, 4 ... limit, where limit is <= n/2. |
Go To Top
|
|
|
|
|
|
|
|
|
|
|
|
Table 1.4 Computations of Standard and
Relative Statistics on Forecast Errors |
|
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
1 |
Period
t |
Actual Data
 |
Forecast
 |
Forecast
Error
 |
Absolute
Error

|
Squared
Error

|
PE
 |
APE |
U's numerator
 |
U's denominator
 |
2 |
1 |
24 |
28 |
-4 |
4 |
16 |
-16.67 |
16.67 |
0.085 |
0.016 |
3 |
2 |
21 |
28 |
-7 |
7 |
49 |
-33.33 |
33.33 |
0.145 |
0.510 |
4 |
3 |
36 |
28 |
8 |
8 |
64 |
22.22 |
22.22 |
0.007 |
0.007 |
5 |
4 |
33 |
30 |
3 |
3 |
9 |
9.09 |
9.09 |
0.001 |
0.008 |
6 |
5 |
36 |
35 |
1 |
1 |
1 |
2.78 |
2.78 |
0.003 |
0.012 |
7 |
6 |
40 |
38 |
2 |
2 |
4 |
5.00 |
5.00 |
0.001 |
0.010 |
8 |
7 |
44 |
45 |
-1 |
1 |
1 |
-2.27 |
2.27 |
0.002 |
0.008 |
9 |
8 |
48 |
50 |
-2 |
2 |
4 |
-4.17 |
4.17 |
0.002 |
0.016 |
10 |
9 |
54 |
52 |
2 |
2 |
4 |
3.70 |
3.70 |
0.005 |
0.001 |
11 |
10 |
56 |
52 |
4 |
4 |
16 |
7.14 |
7.14 |
|
|
Sum |
|
|
|
6 |
34 |
168 |
-6.50 |
106.38 |
0.251 |
0.589 |
|
|
|
|
|
|
|
|
|
|
|
ME |
0.60 |
|
|
SDE |
4.32 |
|
|
|
|
|
MAE |
3.40 |
|
|
SPE |
-6.50 |
|
|
|
|
|
SSE |
168 |
|
|
MPE |
-0.65 |
|
|
|
|
|
MSE |
16.8 |
|
|
MAPE |
10.63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Theil's U |
0.65 |
|
|
|
|
|
|
|
|
|
|
|
Mean
Error, with n error terms. |
ME =
|
=6/10 =0.6
A low value of the ME may conceal forecasting inaccuracy due to
the offsetting effect of large positive and negative forecast
errors. But 0.6 in this case is a fair value, given that there
are no large typically forecast deviations in the series.
However, its inaccuracy will become more apparent from
inspection of subsequent forecast evaluation statistics. |
Mean
Absolute Error |
MAE =
|
=34/10 =3.4
3.4 is a small marginal forecast error, and forecasting has done
a good justice. Even though, the MSE and MAE may overcome the
'cancellation of positive and negative errors' limitation of the
ME, but in some cases they may fail to provide information on
forecasting accuracy relative to the scale of the series being
examined, for example when you have multivariate forecast
series, F1 , F2 , F3,
and more. |
Sum of
Squared Errors |
SSE =
|
=168 |
Mean
Squared Error |
MSE =
|
=168/10 =16.8
Because MSE squares the errors, and thereby give opportunities
to larger errors than does the MAE. This is especially true for
the large squared errors (49 and 62) derived from period 2 and 3
respectively, as compared to the sum of squared errors from
period 5 through to period 9 (total 14), which are even much
smaller. MSE of 16.8 is considered relatively large, and
U-statistic can be used to making a comparison. |
Standard
Deviation Errors |
SDE = |
=SQRT(168/9) =4.32 |
|
Percentage Error |
PE =
|
and so SPE, sum of PE = - 6.5 |
Mean
Percentage Error |
MPE =
|
= - 6.5/10 = - 0.65 |
Mean
Absolute Percentage Error |
MAPE =
|
=106.38/10 = 10.64 |
The objective of using standard
statistics and relative statistical measures is often to look
for an optimization model so as to minimize the sum or mean of
squared errors (MSE, SSE). This may not be a good measure for
several reasons. First, the relative measures are giving equal
weight to all the time series errors as opposed to MSE, which
squares the errors and thereby only emphasizing large errors.
Even then, the intention of minimizing MSE to 0 in the fitting
phase can always be achieved by using a polynomial of
sufficiently high order or an appropriate Fourier
transformation. Over-fitting a model to a data series is as
good as intentionally including randomness in the generating
process, or failing to identify the nonrandom data pattern.
Second, because other forecasting methods use different
procedures in the fitting phrase -- for example, Smoothing
methods are highly dependent on the initial forecasting
estimates; Decomposition methods include the trend-cycle
in the fitting phase; Regression methods minimize
the MSE by giving equal weight to all the collection data
points; Box-Jenkins methods minimize the MSE of a
non-linear optimization procedure. So, using MSE or SSE as a
single criterion alone is rarely adequate. Also, in the
forecasting process, using MSE as a measure of statistics
accuracy can also create problems. Because MSE is an absolute
measure, it does not facilitate comparison across
multivariate time series and for different time intervals. |
Theil's
U-Statistic |
|
Go To Top |
U-Statistic |
U =
|
= SQRT(0.251/0.598) =0.65
,which says that naive method can do an equally good forecasting
than other formal methods.
|
U-Statistical measure, although
consider as a 'naive' approach to formal forecasting, does in
the evaluation process demonstrate a good characteristics of
giving more weight to the large errors (which MSE squared to
create the disproportionate margins of large errors.) than to
the small errors. Theil's U measures how well the forecasting
model predicts against a ‘naive’ model. A forecast in a naive
model is done by repeating the most recent value of the variable
as the next forecasted value. The intervals of the U-statistic
can be examined as:
U = 1: The naive estimation
approach is as good as the forecasting technique.
U near
to 0 : The more accurate the forecasts, the lower the
value of the U1 statistic. The U1 statistic is bounded between 0
and 1, with values closer to 0 indicating greater forecasting
accuracy.
If it is equal to 0 then the
forecasting model is a perfect fit.
U < 1: The forecasting
technique used demonstrated greater accuracy than the naive
method.
U > 1: there is no point in
using a forecasting method, since a naive method in this case,
produce better results.
In a way, U-statistic provides a
good comparison between the formal forecasting methods and the
naive forecasting methods. |
Demand Forecast Accuracy |
Go To Top |
Demand Forecast Error is the deviation of the
actual realized demand quantity from the forecasted quantity.
The Forecast Error can be bigger than Actual or Forecast but NOT
both. Error above 100% gets 0% forecast accuracy or a very
inaccurate forecast.
Error % = | Actual Demand -
Forecast |
Actual Demand
Forecast Accuracy %
= 1 - Error % |
This site
was created in February 2007.
contact Tan, William email:
vbautomation@yahoo.com
|