statisticalstatistical analysisstatistician
In these roles, it is a key tool, and perhaps the only reliable tool. • Abundance estimation • Data science • Glossary of probability and statistics • List of academic statistical associations • List of important publications in statistics • List of national and international statistical services • List of statistical packages (software) • List of statistics articles • List of university statistical consulting centers • Notation in probability and statistics ;Foundations and major areas of statistics • Foundations of statistics • List of statisticians • Official statistics • Multivariate analysis of variance • : • : * Ioannidis, J.P.A. (2005).

List of statistics articles

List of statistical topicsList of statistics topicsIndex of statistics articles
List of statistical packages.

Time series

time series analysistime-seriestime-series analysis
Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate future, given knowledge of the most recent outcomes (forecasting). Forecasting on time series is usually done using automated statistical software packages and programming languages, such as Apache Spark, Julia, Python, R, SAS, SPSS and many others. Forecasting on large scale data is done using Spark which has spark-ts as a third party package. Stationary process. Ergodic process. Consideration of the autocorrelation function and the spectral density function (also cross-correlation functions and cross-spectral density functions).

Vector autoregression

VARvector autoregressive modelstructural VAR estimation
An estimated VAR model can be used for forecasting, and the quality of the forecasts can be judged, in ways that are completely analogous to the methods used in univariate autoregressive modelling. Christopher Sims has advocated VAR models, criticizing the claims and performance of earlier modeling in macroeconomic econometrics. He recommended VAR models, which had previously appeared in time series statistics and in system identification, a statistical specialty in control theory. Sims advocated VAR models as providing a theory-free method to estimate economic relationships, thus being an alternative to the "incredible identification restrictions" in structural models.

Structural break

Sup-LR testStructural break testSup-LM test
In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in general. This issue was popularised by David Hendry, who argued that lack of stability of coefficients frequently caused forecast failure, and therefore we must routinely test for structural stability. Structural stability − i.e., the time-invariance of regression coefficients − is a central issue in all applications of linear regression models. For linear regression models, the Chow test is often used to test for a single break in mean at a known time period K for K ∈ [1,T] .

Regression analysis

regressionmultiple regressionregression model
With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations. Limited dependent variables, which are response variables that are categorical variables or are variables constrained to fall only in a certain range, often arise in econometrics. The response variable may be non-continuous ("limited" to lie on some subset of the real line). For binary (zero or one) variables, if analysis proceeds with least-squares linear regression, the model is called the linear probability model. Nonlinear models for binary dependent variables include the probit and logit model.


Comparison of statistical packages. MATLAB. Scilab. Gretl wiki. GretlWeb. Gretl User's Guide. Lee Adkins's Using gretl for Principles of Econometrics. Gretl Command Reference. Gretl Conference. Berlin, 2015.


micro Tsp
Multiple unit root tests are available in the research software including Dickey–Fuller and Phillips–Perron. * * IHS EViews Home Page Comparison of statistical packages. Gretl. GNU Octave. Matlab. R. Scilab. SPSS. Stata.

JMP (statistical software)

JMPJMP Pro 11JMP statistical software
The FACS system is used to study HIV, cancer, stem-cells and oceanography. * Comparison of statistical packages. Data mining. Data processing. Online analytical processing (OLAP). SAS (software). SQL. JMP website. JMP Blog. customer Wiki community. JMP MediaWiki.

RATS (software)

Regression Analysis of Time SeriesRATSRATS 4
Similarly, SAS has an entire routine for estimating and forecasting with Unobserved Components Models. In RATS, estimation of this type would require extensive programming. Nevertheless, in general, the capabilities of RATS are comparable to SAS/ETS and SAS/STAT, but at a much lower price. * Comparison of statistical packages – includes information on RATS features * Linear regression, including stepwise. Regressions with heteroscedasticity and serial-correlation correction. Non-linear least squares. Two-stage least squares, three-stage least squares, and seemingly unrelated regressions. Non-linear systems estimation. Generalized Method of Moments. Maximum likelihood estimation.

Granger causality

degree of causalityGrangergranger causality analysis
The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality".


. * Demonstration of NLOGIT by KV Krishna Rao List of statistical packages. Comparison of statistical packages. Chang, Jae Bong and Lusk, Jayson (2011). "Mixed Logit Models: Accuracy and Software Choice". Journal of Applied Econometrics 26: 167-172. Greene, William and Hensher, David (2010). Modeling Ordered Choices. Cambridge University Press. LIMDEP. Demonstration of NLOGIT by KV Krishna Rao.


List of statistical packages. Comparison of statistical packages. Chang, Jae Bong and Lusk, Jayson (2011). "Mixed Logit Models: Accuracy and Software Choice". Journal of Applied Econometrics 26: 167-172. Greene, William and Hensher, David (2010). Modeling Ordered Choices. Cambridge University Press. NLOGIT.


Stata JournalStata Journal, TheThe Stata Journal
List of statistical packages. Comparison of statistical packages. Data analysis. Stata Journal. Stata Press.

Data mining

data-miningdataminingknowledge discovery in databases
Often this results from investigating too many hypotheses and not performing proper statistical hypothesis testing. A simple version of this problem in machine learning is known as overfitting, but the same problem can arise at different phases of the process and thus a train/test split - when applicable at all - may not be sufficient to prevent this from happening. The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by data mining algorithms are necessarily valid.

Analysis of variance

ANOVAanalysis of variance (ANOVA)corrected the means
In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. While the analysis of variance reached fruition in the 20th century, antecedents extend centuries into the past according to Stigler. These include hypothesis testing, the partitioning of sums of squares, experimental techniques and the additive model. Laplace was performing hypothesis testing in the 1770s. The development of least-squares methods by Laplace and Gauss circa 1800 provided an improved method of combining observations (over the existing practices then used in astronomy and geodesy).

Autoregressive integrated moving average

ARIMAAutoregressive integrated moving average modelAutoregressive integrated moving average (ARIMA)
The first is non-stationary: while the second is wide-sense stationary: Now forecasts can be made for the process Y_t, using a generalization of the method of autoregressive forecasting. The forecast intervals (confidence intervals for forecasts) for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these assumptions does not hold, then the forecast intervals may be incorrect. For this reason, researchers plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals. 95% forecast interval:, where v_{T+h|T} is the variance of.


Genstat (General Statistics) is a statistical software package with data analysis capabilities, particularly in the field of agriculture. Since 1968, it has been developed by many scientific experts in Rothamsted Research, and has a user-friendly interface, professional modular design, excellent linear mixed models and graphic functions. Leading Genstat’s continued development and distribution is VSN International (VSNi), which is owned by The Numerical Algorithms Group and Rothamsted Research. Genstat is used in a number of research areas, including plant science, forestry, animal science, and medicine, and is recognized by several world-class universities and enterprises.

Ordinary least squares

OLSleast squaresOrdinary least squares regression
In practice s 2 is used more often, since it is more convenient for the hypothesis testing. The square root of s 2 is called the regression standard error, standard error of the regression, or standard error of the equation. It is common to assess the goodness-of-fit of the OLS regression by comparing how much the initial variation in the sample can be reduced by regressing onto X.

Maple (software)

MapleMaple computer algebra systemMaple V
Comparison of statistical packages. List of computer algebra systems. List of computer simulation software. List of graphing software. List of numerical analysis software. Mathematical software. SageMath (an open source algebra program). Maplesoft, division of Waterloo Maple, Inc. – official website. Maple Online Help – online documentation. MaplePrimes – a community website for Maple users. MapleCloud – an online Maple application viewer.


The main modules apart from PcGive for dynamic econometric models (ARDL, VAR, GARCH, Switching, Autometrics), panel data models (DPD), Limited dependent models, are STAMP for structural time series modelling, "SsfPack" for State space methods and "G@RCH" for financial volatility modelling. present many empirical examples in PcGive for OxMetrics in their econometrics textbook. give modern examples in their time series analysis textbook. * * Ox mailing list Econometric software. Comparison of statistical packages. OxMetrics Homepage. PcGive. STAMP software. G@RCH software. Comparison of mathematical programs for data analysis ScientificWeb.

Microsoft Excel

ExcelXLSMS Excel
Excel forecasting functions. Support for multi-selection of Slicer items using touch. Time grouping and Pivot Chart Drill Down. Excel data cards. 1985 Excel 1.0. 1988 Excel 1.5. 1989 Excel 2.2. 1990 Excel 3.0. 1992 Excel 4.0. 1993 Excel 5.0 (part of Office 4.x—Final Motorola 680x0 version and first PowerPC version). 1998 Excel 8.0 (part of Office 98). 2000 Excel 9.0 (part of Office 2001). 2001 Excel 10.0 (part of Office v.

GAUSS (software)

Its primary purpose is the solution of numerical problems in statistics, econometrics, time-series, optimization and 2D- and 3D-visualization. It was first published in 1984 for MS-DOS and is currently also available for Linux, macOS and Windows. A range of toolboxes are available for GAUSS at additional cost. See for complete listing of products. GAUSS has several Application Modules as well as functions in its Run-Time Library (i.e., functions that come with GAUSS without extra cost). Qprog – Quadratic programming. SqpSolvemt – Sequential quadratic programming. QNewton - Quasi-Newton unconstrained optimization.


MaxStat LiteMaxStat Pro
The Lite version features descriptive statistics, hypothesis testing (t-tests, chi-square, 1-way ANOVA with post-hocs, non-parametric tests), distribution testing, linear regression, correlation and a selection of basic graphing functions. It is limited to 254 rows and 12 columns of data, and does not offer import or export functions.


Statsmodels is a Python package that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. It complements SciPy's stats module. Statsmodels is part of the Python scientific stack that is oriented towards data analysis, data science and statistics. Statsmodels is built on top of the numerical libraries NumPy and SciPy, integrates with Pandas for data handling, and uses Patsy for an R-like formula interface. Graphical functions are based on the Matplotlib library.