# Statistics

**statisticalstatistical analysisstatisticianapplied statisticsstatistical methodsstatisticallystatistical methodstatistical datastatsquantitative analysis**

Statistics is the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data.wikipedia

4,012 Related Articles

### Probability theory

**theory of probabilityprobabilityprobability theorist**

Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

As a mathematical foundation for statistics, probability theory is essential to many human activities that involve quantitative analysis of data.

### Method of moments (statistics)

**method of momentsmethod of matching momentsmethod of moment matching**

Pearson developed the Pearson product-moment correlation coefficient, defined as a product-moment, the method of moments for the fitting of distributions to samples and the Pearson distribution, among many other things.

In statistics, the method of moments is a method of estimation of population parameters.

### Linear discriminant analysis

**discriminant analysisDiscriminant function analysisFisher's linear discriminant**

He originated the concepts of sufficiency, ancillary statistics, Fisher's linear discriminator and Fisher information.

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events.

### Sufficient statistic

**sufficient statisticssufficientsufficiency**

He originated the concepts of sufficiency, ancillary statistics, Fisher's linear discriminator and Fisher information.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter".

### Statistical Methods for Research Workers

**intraclass correlationmethods**

Fisher's most important publications were his 1918 seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance, which was the first to use the statistical term, variance, his classic 1925 work Statistical Methods for Research Workers and his 1935 The Design of Experiments, where he developed rigorous design of experiments models.

Statistical Methods for Research Workers is a classic book on statistics, written by the statistician R. A. Fisher.

### Lady tasting tea

Ronald Fisher coined the term null hypothesis during the Lady tasting tea experiment, which "is never proved or established, but is possibly disproved, in the course of experimentation".

In the design of experiments in statistics, the lady tasting tea is a randomized experiment devised by Ronald Fisher and reported in his book The Design of Experiments (1935).

### Cryptanalysis

**cryptanalystcodebreakingcodebreaker**

This text laid the foundations for statistics and cryptanalysis.

Frequency analysis relies on a cipher failing to hide these statistics.

### Survey sampling

**surveyssampleSample Survey**

When full census data cannot be collected, statisticians collect sample data by developing specific experiment designs and survey samples.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey.

### Descriptive statistics

**descriptivedescriptive statisticstatistics**

Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).

The use of descriptive and summary statistics has an extensive history and, indeed, the simple tabulation of populations and of economic data was the first way the topic of statistics appeared.

### Sampling distribution

**finite sample distributiondistributionsampling**

Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.

In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic.

### Type I and type II errors

**Type I errorfalse-positivefalse positive**

Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a "false negative").

### Biometrika

**Biometrika Trust**

Galton and Pearson founded Biometrika as the first journal of mathematical statistics and biostatistics (then called biometry), and the latter founded the world's first university statistics department at University College London.

The principal focus of this journal is theoretical statistics.

### The Design of Experiments

**book**

Fisher's most important publications were his 1918 seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance, which was the first to use the statistical term, variance, his classic 1925 work Statistical Methods for Research Workers and his 1935 The Design of Experiments, where he developed rigorous design of experiments models.

The Design of Experiments is a 1935 book by the English statistician Ronald Fisher about the design of experiments and is considered a foundational work in experimental design.

### Statistical theory

**statisticalstatistical theoriesmathematical statistics**

Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.

### Instrumental variables estimation

**instrumental variableinstrumental variablestwo-stage least squares**

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment.

### Difference in differences

**difference-in-differencesdifference-in-differenceAssumptions**

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.

Difference in differences (DID or DD ) is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment.

### Decision theory

**decision sciencestatistical decision theorydecision sciences**

Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.

Empirical applications of this rich theory are usually done with the help of statistical and econometric methods.

### Consistent estimator

**consistentconsistencyinconsistent**

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ 0 —having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ 0.

### Twelvefold way

**permutations and combinationscombinationcombinations**

Al-Khalil (717–786) wrote the Book of Cryptographic Messages, which contains the first use of permutations and combinations, to list all possible Arabic words with and without vowels.

Another way to think of some of the cases is in terms of sampling, in statistics.

### Big data

**big data analyticsbig data analysisbig-data**

Statistics continues to be an area of active research for example on the problem of how to analyze Big data.

Relational database management systems, desktop statistics and software packages used to visualize data often have difficulty handling big data.

### Blocking (statistics)

**Randomized block designblockingblocks**

In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another.

### Experiment

**experimentalexperimentationexperiments**

An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements.

This equivalency is determined by statistical methods that take into account the amount of variation between individuals and the number of individuals in each group.

### Categorical variable

**categoricalcategorical datadichotomous**

Numerical descriptors include mean and standard deviation for continuous data types (like income), while frequency and percentage are more useful in terms of describing categorical data (like education).

In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed number, possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.

### Calculus

**infinitesimal calculusdifferential and integral calculusclassical calculus**

In the 18th century, statistics also started to draw heavily from calculus.

Calculus is used in every branch of the physical sciences, actuarial science, computer science, statistics, engineering, economics, business, medicine, demography, and in other fields wherever a problem can be mathematically modeled and an optimal solution is desired.

### The Correlation between Relatives on the Supposition of Mendelian Inheritance

Fisher's most important publications were his 1918 seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance, which was the first to use the statistical term, variance, his classic 1925 work Statistical Methods for Research Workers and his 1935 The Design of Experiments, where he developed rigorous design of experiments models.

The paper also contains the first use of the statistical term variance.