Testing hypotheses suggested by the data

post hocHypotheses suggested by the datapost-hocPost hoc'' theorizingtesting effects suggested by the data
In statistics, hypotheses suggested by a given dataset, when tested with the same dataset that suggested them, are likely to be accepted even when they are not true.wikipedia
42 Related Articles

Data dredging

p-hackingp''-hackingdata snooping
See testing hypotheses suggested by the data.

Exploratory data analysis

explorative data analysisexploratorydata analysis
In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to systematic bias owing to the issues inherent in testing hypotheses suggested by the data.

Texas sharpshooter fallacy

sharpshooter fallacy
(See hypothesis testing.) What one cannot do is use the same information to construct and test the same hypothesis (see hypotheses suggested by the data)—to do so would be to commit the Texas sharpshooter fallacy.

Multiple comparisons problem

multiple comparisonsmultiple testingmultiple comparison
Henry Scheffé's simultaneous test of all contrasts in multiple comparison problems is the most well-known remedy in the case of analysis of variance.

Uncomfortable science

This leads to the danger of systematic bias through testing hypotheses suggested by the data.

Post hoc analysis

Post-hoc analysispost-hocpost hoc
Generating hypotheses based on data already observed, in the absence of testing them on new data, is referred to as post hoc theorizing (from Latin post hoc, "after this").
*Testing hypotheses suggested by the data

Type I and type II errors

Type I errorfalse-positivefalse positive
Testing a hypothesis suggested by the data can very easily result in false positives (type I errors). A large set of tests as described above greatly inflates the probability of type I error as all but the data most favorable to the hypothesis is discarded.

Latin

Latin languageLat.la
Generating hypotheses based on data already observed, in the absence of testing them on new data, is referred to as post hoc theorizing (from Latin post hoc, "after this").

Placebo

placebo effectplacebosplacebo studies
The vast majority of them find no significant differences between measurements done on patients who have taken Vitamin X and those who have taken a placebo.

Fraction of variance unexplained

statistical noisenoisenoisy
However, due to statistical noise, one study finds a significant correlation between taking Vitamin X and being cured from cancer.

Scientific evidence

evidencescientific proofproof
Yet, these positive data do not by themselves constitute evidence that the hypothesis is correct.

Probability

probabilisticprobabilitieschance
A large set of tests as described above greatly inflates the probability of type I error as all but the data most favorable to the hypothesis is discarded.

Hypothesis

hypotheseshypotheticalhypothesized
A large set of tests as described above greatly inflates the probability of type I error as all but the data most favorable to the hypothesis is discarded.

Statistical hypothesis testing

hypothesis testingstatistical teststatistical tests
This is a risk, not only in hypothesis testing but in all statistical inference as it is often problematic to accurately describe the process that has been followed in searching and discarding data.

Statistical inference

inferential statisticsinferenceinferences
This is a risk, not only in hypothesis testing but in all statistical inference as it is often problematic to accurately describe the process that has been followed in searching and discarding data.

Data

statistical datascientific datadatum
This is a risk, not only in hypothesis testing but in all statistical inference as it is often problematic to accurately describe the process that has been followed in searching and discarding data.

Statistical model

modelprobabilistic modelstatistical modeling
It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias).

Trial and error

trial-and-errorgenerate and test trial and error principle
It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias).

Overfitting

overfitover-fitover-fitted
It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias).

Publication bias

File drawer problemfile drawer effectself-selecting nature of the positive reports
It is a particular problem in statistical modelling, where many different models are rejected by trial and error before publishing a result (see also overfitting, publication bias). It also commonly occurs in academic publishing where only reports of positive, rather than negative, results tend to be accepted, resulting in the effect known as publication bias.

Data mining

data-miningdataminingknowledge discovery in databases
The error is particularly prevalent in data mining and machine learning.

Machine learning

machine-learninglearningstatistical learning
The error is particularly prevalent in data mining and machine learning.

Academic publishing

research papersacademic paperacademic publisher
It also commonly occurs in academic publishing where only reports of positive, rather than negative, results tend to be accepted, resulting in the effect known as publication bias.

Scheffé's method

Henry Scheffé's simultaneous testScheffe methodScheffé
Henry Scheffé's simultaneous test of all contrasts in multiple comparison problems is the most well-known remedy in the case of analysis of variance.

Analysis of variance

ANOVAanalysis of variance (ANOVA)corrected the means
Henry Scheffé's simultaneous test of all contrasts in multiple comparison problems is the most well-known remedy in the case of analysis of variance.