History of statistics

foundational advanceshistorian of statisticsstat-'' etymology
Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning. * Thomas Bayes. George E. P. Box. Pafnuty Chebyshev. David R. Cox. Gertrude Cox. Harald Cramér. Francis Ysidro Edgeworth. Bradley Efron. Bruno de Finetti. Ronald A. Fisher. Francis Galton. Carl Friedrich Gauss. William Sealey Gosset ("Student"). Al-Kindi. Andrey Kolmogorov. Pierre-Simon Laplace. Erich L. Lehmann. Aleksandr Lyapunov. Anil Kumar Gain. Prasanta Chandra Mahalanobis. Abraham De Moivre. Jerzy Neyman. Florence Nightingale. Blaise Pascal. Karl Pearson. Charles S.

Survey sampling

surveyssampleSample Survey
This selection bias would be corrected by applying a survey weight equal to [1/(# of phone numbers)] to each household. Self-selection bias: A type of bias in which individuals voluntarily select themselves into a group, thereby potentially biasing the response of that group. Participation bias: Bias that arises due to the characteristics of those who choose to participate in a survey or poll. Coverage bias: Coverage bias can occur when population members do not appear in the sample frame (undercoverage). Coverage bias occurs when the observed value deviates from the population parameter due to differences between covered and non-covered units.

Regularization (mathematics)

Bias–variance tradeoff. Matrix regularization. Regularization by spectral filtering. Regularized least squares.


International Business MachinesIBM CorporationInternational Business Machines Corporation
IT outsourcing also represents a major service provided by IBM, with more than 40 data centers worldwide. alphaWorks is IBM's source for emerging software technologies, and SPSS is a software package used for statistical analysis. IBM's Kenexa suite provides employment and retention solutions, and includes the BrassRing, an applicant tracking system used by thousands of companies for recruiting. IBM also owns The Weather Company, which provides weather forecasting and includes weather.com and Weather Underground.

Applied mathematics

applied mathematicianappliedapplications of mathematics
The advent of the computer has enabled new applications: studying and using the new computer technology itself (computer science) to study problems arising in other areas of science (computational science) as well as the mathematics of computation (for example, theoretical computer science, computer algebra, numerical analysis). Statistics is probably the most widespread mathematical science used in the social sciences, but other areas of mathematics, most notably economics, are proving increasingly useful in these disciplines. Academic institutions are not consistent in the way they group and label courses, programs, and degrees in applied mathematics.

Probabilistic classification

Class membership probabilitiesprobabilistic classifierprobabilistic
In the case of decision trees, where Pr(y is the proportion of training samples with label y in the leaf where x ends up, these distortions come about because learning algorithms such as C4.5 or CART explicitly aim to produce homogeneous leaves (giving probabilities close to zero or one, and thus high bias) while using few samples to estimate the relevant proportion (high variance). Calibration can be assessed using a calibration plot (also called a reliability diagram).

Exploratory data analysis

explorative data analysisexploratorydata analysis
Orange, an open-source data mining and machine learning software suite. Python, an open-source programming language widely used in data mining and machine learning. R, an open-source programming language for statistical computing and graphics. Together with Python one of the most popular languages for data science. TinkerPlots an EDA software for upper elementary and middle school students. Weka an open source data mining package that includes visualization and EDA tools such as targeted projection pursuit. Anscombe's quartet, on importance of exploration. Data dredging. Predictive analytics. Structured data analysis (statistics). Configural frequency analysis. Descriptive statistics.

Big data

big data analyticsbig data analysisbig-data
Big data often poses the same challenges as small data; adding more data does not solve problems of bias, but may emphasize other problems. In particular data sources such as Twitter are not representative of the overall population, and results drawn from such sources may then lead to wrong conclusions. Google Translate—which is based on big data statistical analysis of text—does a good job at translating web pages. However, results from specialized domains may be dramatically skewed.

Computational anatomy

Anatomycomputer modelsdiffeomorphometry
It involves the development and application of mathematical, statistical and data-analytical methods for modelling and simulation of biological structures. The field is broadly defined and includes foundations in anatomy, applied mathematics and pure mathematics, machine learning, computational mechanics, computational science, biological imaging, neuroscience, physics, probability, and statistics; it also has strong connections with fluid mechanics and geometric mechanics.


Multivariate statistics. Naturalistic observation. Observational techniques. Opinion polling. Organizational learning. Outcome mapping. Outcomes theory. Participant observation. Participatory impact pathways analysis. Policy analysis. Post occupancy evaluation. Process improvement. Project management. Qualitative research. Quality audit. Quality circle. Quality control. Quality management. Quality management system. Quantitative research. Questionnaire. Questionnaire construction. Root cause analysis. Rubrics. Sampling. Self-assessment. Six Sigma. Standardized testing. Statistical process control. Statistical survey. Statistics. Strategic planning. Structured interviewing. Systems theory.

Leo Breiman

Breiman, LeoBreiman
A video record of a Leo Breiman's lecture about one of his machine learning techniques. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author).

Graphical model

graphical modelsprobabilistic graphical modelprobabilistic graphical models
They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning. Generally, probabilistic graphical models use a graph-based representation as the foundation for encoding a distribution over a multi-dimensional space and a graph that is a compact or factorized representation of a set of independences that hold in the specific distribution. Two branches of graphical representations of distributions are commonly used, namely, Bayesian networks and Markov random fields.

Logistic regression

logit modellogisticlogistic model
Python. in the Statsmodels module. in the Scikit-learn module. in the TensorFlow module. Full example of logistic regression in the Theano tutorial. Bayesian Logistic Regression with ARD prior code, tutorial. Variational Bayes Logistic Regression with ARD prior code, tutorial. Bayesian Logistic Regression code, tutorial. NCSS. Logistic Regression in NCSS. Matlab. in the Statistics and Machine Learning Toolbox (with "incorrect" coded as 2 instead of 0). can all do logistic regression. Java (JVM). LibLinear. Apache Flink. Apache Spark. SparkML supports Logistic Regression. FPGA. in HLS for FPGA. Logistic function. Discrete choice. Jarrow–Turnbull model. Limited dependent variable.

Algorithmic bias

biasalgorithmic transparencyreflect the biases
It recommended researchers to "design these systems so that their actions and decision-making are transparent and easily interpretable by humans, and thus can be examined for any bias they may contain, rather than just learning and repeating these biases". Intended only as guidance, the report did not create any legal precedent. In 2017, New York City passed the first algorithmic accountability bill in the United States.

Sampling (statistics)

samplingrandom samplesample
A theoretical formulation for sampling Twitter data has been developed. In manufacturing different types of sensory data such as acoustics, vibration, pressure, current, voltage and controller data are available at short time intervals. To predict down-time it may not be necessary to look at all the data but a sample may be sufficient. Survey results are typically subject to some error. Total errors can be classified into sampling errors and non-sampling errors. The term "error" here includes systematic biases as well as random errors. Sampling errors and biases are induced by the sample design.

Adaptive website

Adaptive websites
Collaborative filtering such as recommender systems, generate and test methods such as A/B testing, and machine learning techniques such as clustering and classification that are used on a website do not make it an adaptive website. They are all tools and techniques that may be used toward engineering an adaptive website. The collaborative filtering method: Collected user data may be assessed in aggregate (across multiple users) using machine learning techniques to cluster interaction patterns to user models and classify specific user patterns to such models. The website may then be adapted to target clusters of users.

List of statistical software

statistical softwareList of statistical packagesStatistical package
Statistical Lab – R-based and focusing on educational purposes. Torch (machine learning) – a deep learning software library written in Lua (programming language). Weka (machine learning) – a suite of machine learning software written at the University of Waikato. CSPro. Epi Info. X-12-ARIMA. BV4.1. GeoDA. MaxStat Lite – general statistical software. MINUIT. WinBUGS – Bayesian analysis using Markov chain Monte Carlo methods. Winpepi – package of statistical programs for epidemiologists. Alteryx - analytics platform with drag and drop statistical models; R and Python integration. Analytica – visual analytics and statistics package.

Conference on Neural Information Processing Systems

NIPSNeurIPSNeural Information Processing Systems
Video Journal of Machine Learning Abstracts – Volume 3.

Formal science

formal sciencesformalformal disciplines
Formal science is a branch of science studying formal language disciplines concerned with formal systems, such as logic, mathematics, statistics, theoretical computer science, artificial intelligence, information theory, game theory, systems theory, decision theory, and theoretical linguistics. Whereas the natural sciences and social sciences seek to characterize physical systems and social systems, respectively, using empirical methods, the formal sciences are language tools concerned with characterizing abstract structures described by symbolic systems.

Computational science

scientific computingscientific computationcomputational
Computational statistics. Computational sustainability. Computer algebra. Computer simulation. Financial modeling. Geographic information system (GIS). High-performance computing. Machine learning. Network analysis. Neuroinformatics. Numerical linear algebra. Numerical weather prediction. Pattern recognition. Scientific visualization. Simulation. Computer simulations in science. Computational science and engineering. Comparison of computer algebra systems. List of molecular modeling software. List of numerical analysis software. List of statistical packages. Timeline of scientific computing. Simulated reality. Extensions for Scientific Computation (XSC). E. Gallopoulos and A.

Bayesian inference

BayesianBayesian analysisBayesian method
Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning. * * For a full report on the history of Bayesian statistics and the debates with frequentists approaches, read The following books are listed in ascending order of probabilistic sophistication: * * Francisco J. Samaniego (2010), "A Comparison of the Bayesian and Frequentist Approaches to Estimation" Springer, New York, ISBN: 978-1-4419-5940-9 * Data, Uncertainty and Inference An introduction to Bayesian inference and MCMC with a lot of examples fully explained.


Comparison of statistical packages. StatSoft. Hill, T., and Lewicki, P. (2007). STATISTICS Methods and Applications. Tulsa, OK: StatSoft. WEB: http://www.statsoft.com/textbook/. Nisbet, R., Elder, J., and Miner, G. (2009). Handbook of Statistical Analysis and Data Mining Applications. Burlington, MA: Academic Press (Elsevier). Nisbet, R., Elder, J., and Miner, G. (2009). Handbook of Statistical Analysis and Data Mining Applications. Burlington, MA: Academic Press (Elsevier). http://software.dell.com/products/statistica/. StatSoft Homepage. Electronics statistics textbook online Textbook.


scientificsciencesscientific knowledge
"The use of p values for nearly a century [since 1925] to determine statistical significance of experimental results has contributed to an illusion of certainty and [to] reproducibility crises in many scientific fields. There is growing determination to reform statistical analysis... Some [researchers] suggest changing statistical methods, whereas others would do away with a threshold for defining "significant" results." (p. 63.). Feyerabend, Paul (2005). Science, history of the philosophy, as cited in. Feynman, Richard "Cargo Cult Science". Gopnik, Alison, "Finding Our Inner Scientist", Daedalus, Winter 2004.

Heuristic (computer science)

heuristicheuristicsheuristic algorithm
Metaheuristic: Methods for controlling and tuning basic heuristic algorithms, usually with usage of memory and learning. Matheuristics: Optimization algorithms made by the interoperation of metaheuristics and mathematical programming (MP) techniques. Reactive search optimization: Methods using online machine learning principles for self-tuning of heuristics.

Jerome H. Friedman

Jerome FriedmanFriedman, J.H.Friedman, Jerome
In 1982 he was appointed Professor of Statistics at Stanford University. In 1984 he was elected as a Fellow of the American Statistical Association. In 2002 he was awarded the SIGKDD Innovation Award by the ACM. In 2010 he was elected as a member of the [[List of members of the National Academy of Sciences (Applied mathematical sciences)|National Academy of Sciences (Applied mathematical sciences)]]. Friedman has authored and co-authored many publications in the field of data-mining including "nearest neighbor classification, logistical regressions, and high dimensional data analysis. His primary research interest is in the area of machine learning."