Computational statistics

statistical computingscientific computing and statistical practicecomputational methods in statistics
Algorithms for statistical classification. Data science. Statistical methods in artificial intelligence. Free statistical software. List of statistical algorithms. List of statistical packages. Machine learning. International Association for Statistical Computing. Statistical Computing section of the American Statistical Association. Computational Statistics & Data Analysis. Journal of Computational & Graphical Statistics. Statistics and Computing. Communications in Statistics – Simulation and Computation. Journal of Statistical Computation and Simulation.

Algorithmic bias

algorithmic transparencyreflect the biases
It recommended researchers to "design these systems so that their actions and decision-making are transparent and easily interpretable by humans, and thus can be examined for any bias they may contain, rather than just learning and repeating these biases". Intended only as guidance, the report did not create any legal precedent. In 2017, New York City passed the first algorithmic accountability bill in the United States.

Wolfram Mathematica

MathematicaWolframMathematica 8
Multivariate statistics libraries including fitting, hypothesis testing, and probability and expectation calculations on over 160 distributions. Support for censored data, temporal data, time series, and unit based data. Calculations and simulations on random processes and queues. Supervised and unsupervised machine learning tools for data, images and sounds including artificial neural networks. Tools for text mining including regular expressions and semantic analysis. Data mining tools such as cluster analysis, sequence alignment and pattern matching. Computational geometry in 2D, 3D and higher dimensions.

Sample (statistics)

samplesamplesstatistical sample
An unbiased (representative) sample is a set of objects chosen from a complete sample using a selection process that does not depend on the properties of the objects. For example, an unbiased sample of Australian men taller than 2m might consist of a randomly sampled subset of 1% of Australian males taller than 2m. But one chosen from the electoral register might not be unbiased since, for example, males aged under 18 will not be on the electoral register. In an astronomical context, an unbiased sample might consist of that fraction of a complete sample for which data are available, provided the data availability is not biased by individual source properties.

Bias (statistics)

biasbiasedstatistical bias
Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter being estimated. The following lists some types of biases, which can overlap. Selection bias involves individuals being more likely to be selected for study than others, biasing the sample. This can also be termed Berksonian bias. Spectrum bias arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of the sensitivity and specificity of the test.

Regression analysis

regressionmultiple regressionregression model
When the number of measurements, N, is larger than the number of unknown parameters, k, and the measurement errors \epsilon_i are normally distributed then the excess of information contained in (N - k) measurements is used to make statistical predictions about the unknown parameters. This excess of information is referred to as the degrees of freedom of the regression. Classical assumptions for regression analysis include: These are sufficient conditions for the least-squares estimator to possess desirable properties; in particular, these assumptions imply that the parameter estimates will be unbiased, consistent, and efficient in the class of linear unbiased estimators.

Survey methodology

surveysurveysstatistical survey
One common error that results is selection bias. Selection bias results when the procedures used to select a sample result in over representation or under representation of some significant aspect of the population. For instance, if the population of interest consists of 75% females, and 25% males, and the sample consists of 40% females and 60% males, females are under represented while males are overrepresented. In order to minimize selection biases, stratified random sampling is often used. This is when the population is divided into sub-populations called strata, and random samples are drawn from each of the strata, or elements are drawn for the sample on a proportional basis.

Probability theory

theory of probabilityprobabilityprobability theorist
Since it links theoretically derived probabilities to their actual frequency of occurrence in the real world, the law of large numbers is considered as a pillar in the history of statistical theory and has had widespread influence. The (LLN) states that the sample average of a sequence of independent and identically distributed random variables X_k converges towards their common expectation \mu, provided that the expectation of |X_k| is finite.

Pattern recognition

pattern analysispattern detectionpatterns
In pattern recognition, there may be a higher interest to formalize, explain and visualize the pattern, while machine learning traditionally focuses on maximizing the recognition rates. Yet, all of these domains have evolved substantially from their roots in artificial intelligence, engineering and statistics, and they've become increasingly similar by integrating developments and ideas from each other. In machine learning, pattern recognition is the assignment of a label to a given input value. In statistics, discriminant analysis was introduced for this same purpose in 1936.

Estimation theory

parameter estimationestimationestimated
Minimum variance unbiased estimator (MVUE). Nonlinear system identification. Best linear unbiased estimator (BLUE). Unbiased estimators — see estimator bias. Particle filter. Markov chain Monte Carlo (MCMC). Kalman filter, and its various derivatives. Wiener filter. Interpretation of scientific experiments. Signal processing. Clinical trials. Opinion polls. Quality control. Telecommunications. Project management. Software engineering. Control theory (in particular Adaptive control). Network intrusion detection system. Orbit determination. Best linear unbiased estimator (BLUE). Chebyshev center. Completeness (statistics). Cramér–Rao bound. Detection theory. Efficiency (statistics).

Observational error

systematic errormeasurement errorsystematic bias
When either randomness or uncertainty modeled by probability theory is attributed to such errors, they are "errors" in the sense in which that term is used in statistics; see errors and residuals in statistics. Every time we repeat a measurement with a sensitive instrument, we obtain slightly different results. The common statistical model used is that the error has two additive parts: Systematic error is sometimes called statistical bias. It may often be reduced with standardized procedures. Part of the learning process in the various sciences is learning how to use standard instruments and protocols so as to minimize systematic error.

Errors and residuals

residualserror termerror
Since this is a biased estimate of the variance of the unobserved errors, the bias is removed by dividing the sum of the squared residuals by df = n − p − 1, instead of n, where df is the number of degrees of freedom (n minus the number of parameters p being estimated - 1). This forms an unbiased estimate of the variance of the unobserved errors, and is called the mean squared error.

Blinded experiment

double-blinddouble blindblinded
A triple-blind study has the theoretical advantage of allowing the monitoring committee to evaluate the response variable results more objectively. This assumes that appraisal of efficacy and harm, as well as requests for special analyses, may be biased if group identity is known. However, in a trial where the monitoring committee has an ethical responsibility to ensure participant safety, such a design may be counterproductive since in this case monitoring is often guided by the constellation of trends and their directions. In addition, by the time many monitoring committees receive data, often any emergency situation has long passed.

Connectionism

connectionistparallel distributed processingconnectionist models
Hebb contributed greatly to speculations about neural functioning, and proposed a learning principle, Hebbian learning, that is still used today. Lashley argued for distributed representations as a result of his failure to find anything like a localized engram in years of lesion experiments. Though PDP is the dominant form of connectionism, other theoretical work should also be classified as connectionist. Many connectionist principles can be traced to early work in psychology, such as that of William James. Psychological theories based on knowledge about the human brain were fashionable in the late 19th century.

Natural language processing

NLPnatural languagenatural-language processing
Since the so-called "statistical revolution" in the late 1980s and mid 1990s, much natural language processing research has relied heavily on machine learning. The machine-learning paradigm calls instead for using statistical inference to automatically learn such rules through the analysis of large corpora of typical real-world examples (a corpus (plural, "corpora") is a set of documents, possibly with human or computer annotations). Many different classes of machine-learning algorithms have been applied to natural-language-processing tasks. These algorithms take as input a large set of "features" that are generated from the input data.

Time complexity

polynomial timelinear timeexponential time
Although quasi-polynomially solvable, it has been conjectured that the planted clique problem has no polynomial time solution; this planted clique conjecture has been used as a computational hardness assumption to prove the difficulty of several other problems in computational game theory, property testing, and machine learning. The complexity class QP consists of all problems that have quasi-polynomial time algorithms. It can be defined in terms of DTIME as follows. In complexity theory, the unsolved P versus NP problem asks if all problems in NP have polynomial-time algorithms. All the best-known algorithms for NP-complete problems like 3SAT etc. take exponential time.

Experiment

experimentalexperimentationexperiments
Observational studies are limited because they lack the statistical properties of randomized experiments. In a randomized experiment, the method of randomization specified in the experimental protocol guides the statistical analysis, which is usually specified also by the experimental protocol. Without a statistical model that reflects an objective randomization, the statistical analysis relies on a subjective model. Inferences from subjective models are unreliable in theory and practice.

Design of experiments

experimental designdesigndesigned experiment
False positive conclusions, often resulting from the pressure to publish or the author's own confirmation bias, are an inherent hazard in many fields. A good way to prevent biases potentially leading to false positives in the data collection phase is to use a double-blind design. When a double-blind design is used, participants are randomly assigned to experimental groups but the researcher is unaware of what participants belong to which group. Therefore, the researcher can not affect the participants' response to the intervention. Experimental designs with undisclosed degrees of freedom are a problem.

Generalized linear model

generalized linear modelslink functionGLM
In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. Generalized linear models were formulated by John Nelder and Robert Wedderburn as a way of unifying various other statistical models, including linear regression, logistic regression and Poisson regression.

Random variable

random variablesrandom variationrandom
This more general concept of a random element is particularly useful in disciplines such as graph theory, machine learning, natural language processing, and other fields in discrete mathematics and computer science, where one is often interested in modeling the random variation of non-numerical data structures. In some cases, it is nonetheless convenient to represent each element of E using one or more real numbers. In this case, a random element may optionally be represented as a vector of real-valued random variables (all defined on the same underlying probability space \Omega, which allows the different random variables to covary).

Variable (mathematics)

variablesvariableunknown
By extension, they are used to name the corresponding axes. z typically denotes a complex number, or, in statistics, a normal random variable. α, β, γ, θ and φ commonly denote angle measures. ε usually represents an arbitrarily small positive number. ε and δ commonly denote two small positives. λ is used for eigenvalues. σ often denotes a sum, or, in statistics, the standard deviation. Coefficient. Constant of integration. Constant term of a polynomial. Free variables and bound variables (Bound variables are also known as dummy variables). Indeterminate (variable). Lambda calculus. Mathematical expression. Observable variable. Physical constant. Variable (computer science).

Inductive reasoning

inductioninductiveinductive logic
As with deductive arguments, biases can distort the proper application of inductive argument, thereby preventing the reasoner from forming the most logical conclusion based on the clues. Examples of these biases include the availability heuristic, confirmation bias, and the predictable-world bias. The availability heuristic causes the reasoner to depend primarily upon information that is readily available to them. People have a tendency to rely on information that is easily accessible in the world around them.

Social research

sociological researchsociological analysisresearch
Lazarsfeld made great strides in statistical survey analysis, panel methods, latent structure analysis, and contextual analysis. Many of his ideas have been so influential as to now be considered self-evident. • Analytic frame • Behavioural science • Cognitive science • Criminology • Engaged theory • History of social science • History of sociology • Scale (social sciences) • Social psychology • Unobtrusive measures Quantitative designs approach social phenomena through quantifiable evidence, and often rely on statistical analysis of many cases (or across intentionally designed treatments in an experiment) to create valid and reliable general claims. Related to quantity.