A report on Statistics

The normal distribution, a very common probability density, useful because of the central limit theorem.
Scatter plots are used in descriptive statistics to show the observed relationships between different variables, here using the Iris flower data set.
Gerolamo Cardano, a pioneer on the mathematics of probability.
Karl Pearson, a founder of mathematical statistics.
A least squares fit: in red the points to be fitted, in blue the fitted line.
Confidence intervals: the red line is true value for the mean in this example, the blue lines are random confidence intervals for 100 realizations.
In this graph the black line is probability distribution for the test statistic, the critical region is the set of values to the right of the observed data point (observed value of the test statistic) and the p-value is represented by the green area.
The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third variable Z. Z is called a confounding factor.
gretl, an example of an open source statistical package

Discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.

- Statistics
The normal distribution, a very common probability density, useful because of the central limit theorem.

110 related topics with Alpha

Overall

400px

Variance

12 links

400px

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean.

Fisher in 1913

Ronald Fisher

10 links

Fisher in 1913
Fisher in 1913
As a child
Inverforth House, North End Way NW3, where Fisher lived from 1896 to 1904
On graduating from Cambridge University, 1912
The peacock tail in flight, the classic example of a Fisherian runaway
Rothamsted Research
Memorial plaque over his remains, lectern-side aisle of St Peter's Cathedral, Adelaide
Stained glass window (now removed) in the dining hall of Caius College, in Cambridge, commemorating Ronald Fisher and representing a Latin square, discussed by him in The Design of Experiments
As a steward at the First International Eugenics Conference, 1912

Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic.

The probability density function (pdf) of the normal distribution, also called Gaussian or "bell curve", the most important absolutely continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area under the curve.

Probability distribution

9 links

The probability density function (pdf) of the normal distribution, also called Gaussian or "bell curve", the most important absolutely continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area under the curve.
The probability mass function of a discrete probability distribution. The probabilities of the singletons {1}, {3}, and {7} are respectively 0.2, 0.5, 0.3. A set not containing any of these points has probability zero.
The cdf of a discrete probability distribution, ...
... of a continuous probability distribution, ...
... of a distribution which has both a continuous part and a discrete part.
One solution for the Rabinovich–Fabrikant equations. What is the probability of observing a state on a certain place of the support (i.e., the red subset)?

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment.

Dark blue is one standard deviation on either side of the mean. For the normal distribution, this accounts for 68.27 percent of the set; while two standard deviations from the mean (medium and dark blue) account for 95.45 percent; three standard deviations (light, medium, and dark blue) account for 99.73 percent; and four standard deviations account for 99.994 percent. The two points of the curve that are one standard deviation from the mean are also the inflection points.

Standard deviation

8 links

Dark blue is one standard deviation on either side of the mean. For the normal distribution, this accounts for 68.27 percent of the set; while two standard deviations from the mean (medium and dark blue) account for 95.45 percent; three standard deviations (light, medium, and dark blue) account for 99.73 percent; and four standard deviations account for 99.994 percent. The two points of the curve that are one standard deviation from the mean are also the inflection points.
Cumulative probability of a normal distribution with expected value 0 and standard deviation 1
Example of samples from two populations with the same mean but different standard deviations. Red population has mean 100 and SD 10; blue population has mean 100 and SD 50.
Percentage within(z)
z(Percentage within)
The standard deviation ellipse (green) of a two-dimensional normal distribution

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values.

In linear regression, the observations ( red ) are assumed to be the result of random deviations ( green ) from an underlying relationship ( blue ) between a dependent variable (y) and an independent variable (x).

Linear regression

11 links

In linear regression, the observations ( red ) are assumed to be the result of random deviations ( green ) from an underlying relationship ( blue ) between a dependent variable (y) and an independent variable (x).
Example of a cubic polynomial regression, which is a type of linear regression. Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.
The data sets in the Anscombe's quartet are designed to have approximately the same linear regression line (as well as nearly identical means, standard deviations, and correlations) but are graphically very different. This illustrates the pitfalls of relying solely on a fitted model to understand the relationship between variables.
Example of simple linear regression, which has one independent variable
Francis Galton's 1886 illustration of the correlation between the heights of adults and their parents. The observation that adult children's heights tended to deviate less from the mean height than their parents suggested the concept of "regression toward the mean", giving regression its name. The "locus of horizontal tangential points" passing through the leftmost and rightmost points on the ellipse (which is a level curve of the bivariate normal distribution estimated from the data) is the OLS estimate of the regression of parents' heights on children's heights, while the "locus of vertical tangential points" is the OLS estimate of the regression of children's heights on parent's heights. The major axis of the ellipse is the TLS estimate.
Comparison of the Theil–Sen estimator (black) and simple linear regression (blue) for a set of points with outliers.
To check for violations of the assumptions of linearity, constant variance, and independence of errors within a linear regression model, the residuals are typically plotted against the predicted values (or each of the individual predictors). An apparently random scatter of points about the horizontal midline at 0 is ideal, but cannot rule out certain kinds of violations such as autocorrelation in the errors or their correlation with one or more covariates.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).

Estimator

8 links

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished.

3rd century BC Greek mathematician Euclid (holding calipers), as imagined by Raphael in this detail from The School of Athens (1509–1511)

Mathematics

5 links

Area of knowledge that includes such topics as numbers , formulas and related structures (algebra), shapes and the spaces in which they are contained (geometry), and quantities and their changes (calculus and analysis).

Area of knowledge that includes such topics as numbers , formulas and related structures (algebra), shapes and the spaces in which they are contained (geometry), and quantities and their changes (calculus and analysis).

3rd century BC Greek mathematician Euclid (holding calipers), as imagined by Raphael in this detail from The School of Athens (1509–1511)
The distribution of prime numbers is a central point of study in number theory. This Ulam spiral serves to illustrate it, hinting, in particular, at the conditional independence between being prime and being a value of certain quadratic polynomials.
The quadratic formula expresses concisely the solutions of all quadratic equations
Rubik's cube: the study of its possible moves is a concrete application of group theory
The Babylonian mathematical tablet Plimpton 322, dated to 1800 BC.
Archimedes used the method of exhaustion, depicted here, to approximate the value of pi.
The numerals used in the Bakhshali manuscript, dated between the 2nd century BC and the 2nd century AD.
A page from al-Khwārizmī's Algebra
Leonardo Fibonacci, the Italian mathematician who introduced the Hindu–Arabic numeral system invented between the 1st and 4th centuries by Indian mathematicians, to the Western World.
Leonhard Euler created and popularized much of the mathematical notation used today.
Carl Friedrich Gauss, known as the prince of mathematicians
The front side of the Fields Medal
Euler's identity, which American physicist Richard Feynman once called "the most remarkable formula in mathematics".

Some areas of mathematics, such as statistics and game theory, are developed in close correlation with their applications and are often grouped under applied mathematics.

Mean squared error

7 links

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.

Example of samples from two populations with the same mean but different dispersion. The blue population is much more dispersed than the red population.

Statistical dispersion

4 links

Example of samples from two populations with the same mean but different dispersion. The blue population is much more dispersed than the red population.

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.

Bias of an estimator

6 links

In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated.