# Statistical hypothesis testing

**hypothesis testingstatistical teststatistical testssignificance teststatistical hypothesis testtesthypothesis testscritical regionstatistical significance testtesting**

A statistical hypothesis, sometimes called confirmatory data analysis, is a hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables.wikipedia

400 Related Articles

### Alternative hypothesis

**alternative hypothesesalternativealternatives**

A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis that proposes no relationship between two data sets.

In statistical hypothesis testing, the alternative hypothesis is a position that states something is happening, a new theory is true instead of an old one (null hypothesis).

### Statistical model

**modelprobabilistic modelstatistical modeling**

A statistical hypothesis, sometimes called confirmatory data analysis, is a hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables.

All statistical hypothesis tests and all statistical estimators are derived via statistical models.

### Null hypothesis

**nullnull hypotheseshypothesis**

The comparison is deemed statistically significant if the relationship between the data sets would be an unlikely realization of the null hypothesis according to a threshold probability—the significance level. Modern significance testing is largely the product of Karl Pearson (p-value, Pearson's chi-squared test), William Sealy Gosset (Student's t-distribution), and Ronald Fisher ("null hypothesis", analysis of variance, "significance test"), while hypothesis testing was developed by Jerzy Neyman and Egon Pearson (son of Karl).

In the hypothesis testing approach of Jerzy Neyman and Egon Pearson, a null hypothesis is contrasted with an alternative hypothesis and the two hypotheses are distinguished on the basis of data, with certain error rates.

### P-value

**p''-valuepp''-values**

Modern significance testing is largely the product of Karl Pearson (p-value, Pearson's chi-squared test), William Sealy Gosset (Student's t-distribution), and Ronald Fisher ("null hypothesis", analysis of variance, "significance test"), while hypothesis testing was developed by Jerzy Neyman and Egon Pearson (son of Karl).

In statistical hypothesis testing, the p-value or probability value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct.

### Test statistic

**Common test statisticst''-test of test statistics**

Sometime around 1940, in an apparent effort to provide researchers with a "non-controversial" way to have their cake and eat it too, the authors of statistical text books began anonymously combining these two strategies by using the p-value in place of the test statistic (or data) to test against the Neyman–Pearson "significance level".

A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.

### Confidence interval

**confidence intervalsconfidence levelconfidence**

Hypothesis tests based on statistical significance are another way of expressing confidence intervals (more precisely, confidence sets).

Confidence intervals are closely related to statistical significance testing.

### Power (statistics)

**statistical powerpowerpowerful**

Unless a test with particularly high power is used, the idea of "accepting" the null hypothesis may be dangerous.

Statistical tests use data from samples to assess, or make inferences about, a statistical population.

### Effect size

**Cohen's deffect sizesmagnitude**

Other fields have favored the estimation of parameters (e.g. effect size).

Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses.

### Akaike information criterion

**AICAIC-basedAICc**

The most common selection techniques are based on either Akaike information criterion or Bayes factor.

Every statistical hypothesis test can be formulated as a comparison of statistical models.

### Normal distribution

**normally distributedGaussian distributionnormal**

These values are used in hypothesis testing, construction of confidence intervals and Q-Q plots.

### Scientific method

**scientific researchscientificmethod**

Significance testing is used as a substitute for the traditional comparison of predicted value and experimental result at the core of the scientific method.

A statistical hypothesis is a conjecture about a given statistical population.

### Type I and type II errors

**Type I errorfalse-positivefalse positive**

(The two types are known as type 1 and type 2 errors.)

In statistical hypothesis testing a type I error is the rejection of a true null hypothesis (also known as a "false positive" finding or conclusion), while a type II error is the non-rejection of a false null hypothesis (also known as a "false negative" finding or conclusion).

### Statistical significance

**statistically significantsignificantsignificance level**

The comparison is deemed statistically significant if the relationship between the data sets would be an unlikely realization of the null hypothesis according to a threshold probability—the significance level. Modern significance testing is largely the product of Karl Pearson (p-value, Pearson's chi-squared test), William Sealy Gosset (Student's t-distribution), and Ronald Fisher ("null hypothesis", analysis of variance, "significance test"), while hypothesis testing was developed by Jerzy Neyman and Egon Pearson (son of Karl).

In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis.

### False discovery rate

**false discovery rate (FDR)positive false discovery rateBenjamini-Hochberg procedure**

These are often dealt with by using multiplicity correction procedures that control the family wise error rate (FWER) or the false discovery rate (FDR).

This, coupled with the growth in computing power, made it possible to seamlessly perform hundreds and thousands of statistical tests on a given data set.

### Data mining

**data-miningdataminingknowledge discovery in databases**

A related problem is that of multiple testing (sometimes linked to data mining), in which a variety of tests for a variety of possible effects are applied to a single data set and only those yielding a significant result are reported.

Often this results from investigating too many hypotheses and not performing proper statistical hypothesis testing.

### Hypothesis

**hypotheseshypotheticalhypothesized**

A statistical hypothesis, sometimes called confirmatory data analysis, is a hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables.

Instead, statistical tests are used to determine how likely it is that the overall effect would be observed if the hypothesized relation does not exist.

### Uniformly most powerful test

**uniformly more powerfulKarlin–Rubin theoremmost powerful test**

In statistical hypothesis testing, a uniformly most powerful (UMP) test is a hypothesis test which has the greatest power 1 - \beta among all possible tests of a given size α.

### Resampling (statistics)

**resamplingstatistical supportpermutation test**

A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points.

### Frequentist inference

**frequentistfrequentist statisticsclassical**

Statistical hypothesis testing is a key technique of both frequentist inference and Bayesian inference, although the two types of inference have notable differences.

This is the inference framework in which the well-established methodologies of statistical hypothesis testing and confidence intervals are based.

### False positives and false negatives

**false positivefalse negativefalse positives**

That is, one decides how often one accepts an error of the first kind – a false positive, or Type I error.

In statistical hypothesis testing the analogous concepts are known as type I and type II errors, where a positive result corresponds to rejecting the null hypothesis, and a negative result corresponds to not rejecting the null hypothesis.

### Exact test

**exact inferenceexactness**

Using an exact test provides a significance test that keeps the Type I error rate of the test (\alpha) at the desired significance level of the test.

### Analysis of variance

**ANOVAanalysis of variance (ANOVA)corrected the means**

Modern significance testing is largely the product of Karl Pearson (p-value, Pearson's chi-squared test), William Sealy Gosset (Student's t-distribution), and Ronald Fisher ("null hypothesis", analysis of variance, "significance test"), while hypothesis testing was developed by Jerzy Neyman and Egon Pearson (son of Karl).

In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.

### John Arbuthnot

**ArbuthnotDr John ArbuthnotDr. Arbuthnot**

The earliest use of statistical hypothesis testing is generally credited to the question of whether male and female births are equally likely (null hypothesis), which was addressed in the 1700s by John Arbuthnot (1710), and later by Pierre-Simon Laplace (1770s).

This paper was a landmark in the history of statistics; in modern terms he performed statistical hypothesis testing, computing the p-value (via a sign test), interpreted it as statistical significance, and rejected the null hypothesis.

### Size (statistics)

**Size**

In statistics, the size of a test is the probability of falsely rejecting the null hypothesis.

### Jerzy Neyman

**NeymanJerzy Spława-NeymanNeyman, Jerzy**

Modern significance testing is largely the product of Karl Pearson (p-value, Pearson's chi-squared test), William Sealy Gosset (Student's t-distribution), and Ronald Fisher ("null hypothesis", analysis of variance, "significance test"), while hypothesis testing was developed by Jerzy Neyman and Egon Pearson (son of Karl).

Neyman first introduced the modern concept of a confidence interval into statistical hypothesis testing and co-revised Ronald Fisher's null hypothesis testing (in collaboration with Egon Pearson).