The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or click here to continue anyway

Statistics With Excel?

Is Microsoft Excel an Adequate Statistics Package? Changes in Excel 2010 and following have improved its use for statistical analysis. The primary deficiency is still that Excel does not contain many of the most useful and modern methods for environmental statistics. So unfortunately, the answer is generally ‘No’. Excel is available to many people as part of Microsoft Office. People often use Excel as their everyday statistics software because they have already purchased it. Excel’s limitations make this a very questionable practice for scientific applications. For business applications where questions might be simpler and precision not as necessary, Excel may be just fine. For comments on versions of Excel prior to 2010, see our Pre-2010 Excel page.What has improved from our earlier review?

Several Excel functions have been changed to be more correct, and/or useful:

  1. The PERCENTILE.EXC function was added, which uses the Weibull formula [ i/(n+1) ] as the plotting position for the observation with rank i. This is appropriate when describing a subset (sample) of all possible conditions that are out in the real world, leaving a finite probability that the current maximum of your data might later be exceeded. The previous percentile function used the plotting position [i/n], placing the largest observation at the 100th percentile, so without possibility of being exceeded. This is only appropriate when the entire population (real world) has been observed — a census. Using the PERCENTILE.EXC function, computed values will now agree with standard statistics software. The less-applicable "census of all data" formula continues as the PERCENTILE.INC function, which should be avoided for scientific applications.
  2. The RANK.AVG function was added, which assigns the average rank to all tied values. The previous RANK function assigned the lowest rank to all tied values. Using RANK.AVG, the user can build their own nonparametric procedures such as Wilcoxon and Kruskal-Wallis tests by programming, even though these tests are not available as built-in commands within Excel.
  3. Two-tailed probability functions such as T.DIST.2T were added to clearly provide a two-tailed probability of exceedance for hypothesis tests. Confusion often occurred with earlier t-test functions in Excel. People were commonly computing one-sided test results but believing they were 2-sided tests, or vice-versa. The old functions were inconsistent and unclear in whether they returned one- or two-tailed probabilities. The newer function names make it clearer as to what is being computed.
  4. Boxplots (called Box and Whisker charts) have been added to the Chart menu in Excel 2016. These are the simplest form, drawing whiskers out to the max and min of the data. But much better than nothing.
  5. The CONFIDENCE function previously computed confidence intervals using a normal distribution (z-scores), only applicable when the true variance is known, which in practice is never. The t-interval should have been used instead. Excel 2016 now has the CONFIDENCE.T function for computing confidence intervals correctly. The old function continues as CONFIDENCE.NORM, which should be avoided for scientific applications.
Limitations of Excel1. Many useful functions or tests are not available in Excel and its Analysis Tools, including:

Some of these can be programmed by combining several steps and functions within Excel (see ref #1 for many formulas).

2. Advanced but regularly-used methods in environmental statistics are not available in Excel, including:

Excel's lack of functionality limits its use to simple statistical tasks. For alternatives, see the Possible Solutions presented below.

3. Several Excel procedures are misleading/wrong4. Previous problems not yet evaluated5. Regression routines are incorrect for multicollinear data.

This affects multiple regression. A good statistics package will report errors due to correlations among the X variables. The Variance Inflation Factor (VIF) is one measure of collinearity. Excel does not compute collinearity measures, does not warn the user when collinearity is present, and reports parameter estimates that may be nonsensical. Are multicollinear data of concern in ‘practical’ problems? I think so -- I find many examples of collinearity in environmental data sets.

Possible SolutionsSolution #1: Write your own proceduresIn "Statistical Analysis with Excel for Dummies", Schmuller provides the code to put into an Excel cell to compute Spearman's rho, Scheffe's multiple comparison test, the rank-sum and Kruskal-Wallis tests. The Levene's test could be programmed easily using the one-factor ANOVA tool. Writing your own procedures will help for a few but not most of the above problems.Solution #2: Excel add-ins

A number of limitations can be overcome by using add-in packages that avoid using Excel’s computation procedures. Several of these were evaluated in

our September 2013 newsletter reviewing statistical software for under $600

. These programs use Excel for data handling, but their own routines for data analysis. They vary widely in which routines are included - again, see our newsletter. NOTE: some of these add-ins alter your copy of Excel. Perform a complete backup prior to installing them, in case you don't find them helpful and want to go back to the pristine version. Uninstalling add-ins is not always simple.

Solution #3: Purchase statistics software, or use R (industry standard and free)


November 2007 newsletter

reviewed the capabilities of commercial statistics software. Our

Summer 2006 and January 2010 newsletters

described the free R statistical software environment.

All of our courses

are taught with either commercial software, or with R. We couldn't get through the routines presented in the first day of our introductory Applied Environmental Statistics course if using Excel.

Note: All opinions other than those cited as coming from others are my own.References and LinksArticles:

(1) Statistical Analysis with Excel for Dummies. J. Schmuller, 2016. Wiley, 527 p.


On the numerical accuracy of spreadsheets

by Almiron and others, 2010, in the Journal of Statistical Software (American Statistical Association).

Continue reading on