# Exploratory data analysis

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.wikipedia

135 Related Articles

### Data analysis

**data analyticsanalysisdata analyst**

In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA).

### Statistical hypothesis testing

**hypothesis testingstatistical teststatistical tests**

These statistical developments, all championed by Tukey, were designed to complement the analytic theory of testing statistical hypotheses, particularly the Laplacian tradition's emphasis on exponential families.

Confirmatory data analysis can be contrasted with exploratory data analysis, which may not have pre-specified hypotheses.

### Statistical model

**modelprobabilistic modelstatistical modeling**

A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

Models can be compared to each other by exploratory data analysis or confirmatory data analysis.

### Targeted projection pursuit

Targeted projection pursuit

Targeted projection pursuit is a type of statistical technique used for exploratory data analysis, information visualization, and feature selection.

### Statistical graphics

**graphical techniquegraphicalgraphical techniques**

Typical graphical techniques used in EDA are:

Exploratory data analysis (EDA) relies heavily on such techniques.

### Principal component analysis

**principal components analysisPCAprincipal components**

Principal component analysis (PCA)

PCA is mostly used as a tool in exploratory data analysis and for making predictive models.

### Median polish

Median polish

The median polish is a simple and robust exploratory data analysis procedure proposed by the statistician John Tukey.

### John Tukey

**TukeyTukey, JohnJohn W. Tukey**

Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments.

He also contributed to statistical practice and articulated the important distinction between exploratory data analysis and confirmatory data analysis, believing that much statistical methodology placed too great an emphasis on the latter.

### Ordination (statistics)

**ordinationgradient analysisordination techniques**

Ordination

Ordination or gradient analysis, in multivariate analysis, is a method complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing).

### Stem-and-leaf display

**Stem-and-Leaf Plotstemplotstem and leaf plot**

Stem-and-leaf plot

They evolved from Arthur Bowl's work in the early 1900s, and are useful tools in exploratory data analysis.

### Order statistic

**order statisticsorderedth-smallest of items**

Francis Galton emphasized order statistics and quantiles.

A similar important statistic in exploratory data analysis that is simply related to the order statistics is the sample interquartile range.

### Machine learning

**learningmachine-learningstatistical learning**

Orange, an open-source data mining and machine learning software suite.

Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning.

### Data visualization

**visualizationdata visualisationdata visualizations**

GGobi is a free software for interactive data visualization data visualization

Data visualization is closely related to information graphics, information visualization, scientific visualization, exploratory data analysis and statistical graphics.

### TinkerPlots

TinkerPlots an EDA software for upper elementary and middle school students.

TinkerPlots is exploratory data analysis and modeling software designed for use by students in grades 4 through university.

### Testing hypotheses suggested by the data

**post hocpost-hochypotheses suggested by the data**

In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to systematic bias owing to the issues inherent in testing hypotheses suggested by the data.

### Trimean

Trimean

The foundations of the trimean were part of Arthur Bowley's teachings, and later popularized by statistician John Tukey in his 1977 book which has given its name to a set of techniques called exploratory data analysis.

### GGobi

GGobi is a free software for interactive data visualization data visualization

GGobi is a program which allows exploratory data analysis to occur for multi-dimensional data.

### Configural frequency analysis

Configural frequency analysis

Configural frequency analysis (CFA) is a method of exploratory data analysis, introduced by Gustav A. Lienert in 1969.

### Box plot

**boxplotbox and whisker plotadjusted boxplots**

Box plot

### Arthur Lyon Bowley

**BowleyA. L. BowleyArthur Bowley**

Arthur Lyon Bowley used precursors of the stemplot and five-number summary (Bowley actually used a "seven-figure summary", including the extremes, deciles and quartiles, along with the median - see his Elementary Manual of Statistics (3rd edn., 1920), p. 62 – he defines "the maximum and minimum, median, quartiles and two deciles" as the "seven positions").

Bowley's teaching presaged several of the EDA ideas later popularised by John Tukey, including stemplots, decile boxplots, the seven-figure summary and trimean.

### Data dredging

**p-hackingp''-hackingdata snooping**

Data dredging

When neither approach is practical, one can make a clear distinction between data analyses that are confirmatory and analyses that are exploratory.

### Descriptive statistics

**descriptivedescriptive statisticstatistics**

Descriptive statistics

More recently, a collection of summarisation techniques has been formulated under the heading of exploratory data analysis: an example of such a technique is the box plot.

### Anscombe's quartet

Anscombe's quartet, on importance of exploration

### Statistics

**statisticalstatistical analysisstatistician**

### Data set

**datasetdatasetsdata**

