Machine learning as subfield of AI
The normal distribution, a very common probability density, useful because of the central limit theorem.
An example of data produced by data dredging through a bot operated by statistician Tyler Vigen, apparently showing a close link between the best word winning a spelling bee competition and the number of people in the United States killed by venomous spiders. The similarity in trends is obviously a coincidence.
Part of machine learning as subfield of AI or part of AI as subfield of machine learning
Scatter plots are used in descriptive statistics to show the observed relationships between different variables, here using the Iris flower data set.
A support-vector machine is a supervised learning model that divides the data into regions separated by a linear boundary. Here, the linear boundary divides the black circles from the white.
Gerolamo Cardano, a pioneer on the mathematics of probability.
An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another.
Karl Pearson, a founder of mathematical statistics.
Illustration of linear regression on a data set.
A least squares fit: in red the points to be fitted, in blue the fitted line.
A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet.
Confidence intervals: the red line is true value for the mean in this example, the blue lines are random confidence intervals for 100 realizations.
The blue line could be an example of overfitting a linear function due to random noise.
In this graph the black line is probability distribution for the test statistic, the critical region is the set of values to the right of the observed data point (observed value of the test statistic) and the p-value is represented by the green area.
The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third variable Z. Z is called a confounding factor.
gretl, an example of an open source statistical package

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

- Data mining

Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.

- Machine learning

It can include extrapolation and interpolation of time series or spatial data, and data mining.

- Statistics

Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns.

- Machine learning

Machine learning models are statistical and probabilistic models that capture patterns in the data through use of computational algorithms.

- Statistics
Machine learning as subfield of AI

0 related topics with Alpha