The normal distribution, a very common probability density, useful because of the central limit theorem.
An example of data produced by data dredging through a bot operated by statistician Tyler Vigen, apparently showing a close link between the best word winning a spelling bee competition and the number of people in the United States killed by venomous spiders. The similarity in trends is obviously a coincidence.
Scatter plots are used in descriptive statistics to show the observed relationships between different variables, here using the Iris flower data set.
Gerolamo Cardano, a pioneer on the mathematics of probability.
Karl Pearson, a founder of mathematical statistics.
A least squares fit: in red the points to be fitted, in blue the fitted line.
Confidence intervals: the red line is true value for the mean in this example, the blue lines are random confidence intervals for 100 realizations.
In this graph the black line is probability distribution for the test statistic, the critical region is the set of values to the right of the observed data point (observed value of the test statistic) and the p-value is represented by the green area.
The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third variable Z. Z is called a confounding factor.
gretl, an example of an open source statistical package

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

- Data mining

It can include extrapolation and interpolation of time series or spatial data, and data mining.

- Statistics
The normal distribution, a very common probability density, useful because of the central limit theorem.

2 related topics

Alpha

Machine learning as subfield of AI

Machine learning

Field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.

Field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.

Machine learning as subfield of AI
Part of machine learning as subfield of AI or part of AI as subfield of machine learning
A support-vector machine is a supervised learning model that divides the data into regions separated by a linear boundary. Here, the linear boundary divides the black circles from the white.
An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another.
Illustration of linear regression on a data set.
A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet.
The blue line could be an example of overfitting a linear function due to random noise.

Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.

Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns.

SPSS v27 running on Windows 7

SPSS

Statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation.

Statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation.

SPSS v27 running on Windows 7
The SPSS logo used prior to the renaming in January 2010.

SPSS is a widely used program for statistical analysis in social science.

Companion software in the "IBM SPSS" family are used for data mining and text analytics (IBM SPSS Modeler), and realtime credit scoring services (IBM SPSS Collaboration and Deployment Services).