# Testing hypotheses suggested by the data

**post hocHypotheses suggested by the datapost-hocPost hoc'' theorizingtesting effects suggested by the data**

In statistics, hypotheses suggested by a given dataset, when tested with the same dataset that suggested them, are likely to be accepted even when they are not true.wikipedia

42 Related Articles

### Data dredging

**p-hackingp''-hackingdata snooping**

See testing hypotheses suggested by the data.

### Exploratory data analysis

**explorative data analysisexploratorydata analysis**

In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to systematic bias owing to the issues inherent in testing hypotheses suggested by the data.

### Texas sharpshooter fallacy

**sharpshooter fallacy**

(See hypothesis testing.) What one cannot do is use the same information to construct and test the same hypothesis (see hypotheses suggested by the data)—to do so would be to commit the Texas sharpshooter fallacy.

### Multiple comparisons problem

**multiple comparisonsmultiple testingmultiple comparison**

Henry Scheffé's simultaneous test of all contrasts in multiple comparison problems is the most well-known remedy in the case of analysis of variance.

### Uncomfortable science

This leads to the danger of systematic bias through testing hypotheses suggested by the data.

### Post hoc analysis

**Post-hoc analysispost-hocpost hoc**

Generating hypotheses based on data already observed, in the absence of testing them on new data, is referred to as post hoc theorizing (from Latin post hoc, "after this").

*Testing hypotheses suggested by the data

### Type I and type II errors

**Type I errorfalse-positivefalse positive**

Testing a hypothesis suggested by the data can very easily result in false positives (type I errors). A large set of tests as described above greatly inflates the probability of type I error as all but the data most favorable to the hypothesis is discarded.

### Latin

**Latin languageLat.la**

Generating hypotheses based on data already observed, in the absence of testing them on new data, is referred to as post hoc theorizing (from Latin post hoc, "after this").

### Placebo

**placebo effectplacebosplacebo studies**

The vast majority of them find no significant differences between measurements done on patients who have taken Vitamin X and those who have taken a placebo.

### Fraction of variance unexplained

**statistical noisenoisenoisy**

However, due to statistical noise, one study finds a significant correlation between taking Vitamin X and being cured from cancer.

### Scientific evidence

**evidencescientific proofproof**

Yet, these positive data do not by themselves constitute evidence that the hypothesis is correct.

### Probability

**probabilisticprobabilitieschance**

A large set of tests as described above greatly inflates the probability of type I error as all but the data most favorable to the hypothesis is discarded.

### Hypothesis

**hypotheseshypotheticalhypothesized**

A large set of tests as described above greatly inflates the probability of type I error as all but the data most favorable to the hypothesis is discarded.

### Statistical hypothesis testing

**hypothesis testingstatistical teststatistical tests**

This is a risk, not only in hypothesis testing but in all statistical inference as it is often problematic to accurately describe the process that has been followed in searching and discarding data.

### Statistical inference

**inferential statisticsinferenceinferences**

This is a risk, not only in hypothesis testing but in all statistical inference as it is often problematic to accurately describe the process that has been followed in searching and discarding data.

### Data

**statistical datascientific datadatum**

This is a risk, not only in hypothesis testing but in all statistical inference as it is often problematic to accurately describe the process that has been followed in searching and discarding data.

### Statistical model

**modelprobabilistic modelstatistical modeling**

It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias).

### Trial and error

**trial-and-errorgenerate and test trial and error principle**

It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias).

### Overfitting

**overfitover-fitover-fitted**

It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias).

### Publication bias

**File drawer problemfile drawer effectself-selecting nature of the positive reports**

It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias). It also commonly occurs in academic publishing where only reports of positive, rather than negative, results tend to be accepted, resulting in the effect known as publication bias.

### Data mining

**data-miningdataminingknowledge discovery in databases**

The error is particularly prevalent in data mining and machine learning.

### Machine learning

**machine-learninglearningstatistical learning**

The error is particularly prevalent in data mining and machine learning.

### Academic publishing

**research papersacademic paperacademic publisher**

It also commonly occurs in academic publishing where only reports of positive, rather than negative, results tend to be accepted, resulting in the effect known as publication bias.

### Scheffé's method

**Henry Scheffé's simultaneous testScheffe methodScheffé**

Henry Scheffé's simultaneous test of all contrasts in multiple comparison problems is the most well-known remedy in the case of analysis of variance.

### Analysis of variance

**ANOVAanalysis of variance (ANOVA)corrected the means**

Henry Scheffé's simultaneous test of all contrasts in multiple comparison problems is the most well-known remedy in the case of analysis of variance.