Median

averagesample medianmedian-unbiased estimator
The median is used primarily for skewed distributions, which it summarizes differently from the arithmetic mean. Consider the multiset { 1, 2, 2, 2, 3, 14 }. The median is 2 in this case, (as is the mode), and it might be seen as a better indication of central tendency (less susceptible to the exceptionally large value in data) than the arithmetic mean of 4. The median is a popular summary statistic used in descriptive statistics, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.

Mode (statistics)

modemodalmodes
Descriptive statistics. Moment (mathematics). Summary statistics. Unimodal function. A Guide to Understanding & Calculating the Mode. Mean, Median and Mode short beginner video from Khan Academy. Mean, Median and Mode short beginner video from Khan Academy.

Complexity class

complexity classescomplexitycomputational complexity
Many complexity classes can be characterized in terms of the mathematical logic needed to express them; see descriptive complexity. The most commonly used problems are decision problems. However, complexity classes can be defined based on function problems (an example is FP), counting problems (e.g. #P), optimization problems, promise problems, etc. The most common model of computation is the deterministic Turing machine, but many complexity classes are based on nondeterministic Turing machines, boolean circuits, quantum Turing machines, monotone circuits, etc.

Variance

sample variancepopulation variancevariability
Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by \sigma^2, s^2, or.

First-order logic

predicate logicfirst-orderpredicate calculus
Decidable subsets of first-order logic are also studied in the framework of description logics. The Löwenheim–Skolem theorem shows that if a first-order theory of cardinality λ has an infinite model, then it has models of every infinite cardinality greater than or equal to λ. One of the earliest results in model theory, it implies that it is not possible to characterize countability or uncountability in a first-order language with a countable signature. That is, there is no first-order formula φ(x) such that an arbitrary structure M satisfies φ if and only if the domain of discourse of M is countable (or, in the second case, uncountable).

Box plot

boxplotbox and whisker plotadjusted boxplots
In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution (though Tukey's boxplot assumes symmetry for the whiskers and normality for their length).

Range (statistics)

rangerangingsample range
However, in descriptive statistics, this concept of range has a more complex meaning. The range is the size of the smallest interval (statistics) which contains all the data and provides an indication of statistical dispersion. It is measured in the same units as the data. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets. For n independent and identically distributed continuous random variables X 1, X 2, ..., X n with cumulative distribution function G(x) and probability density function g(x). Let T denote the range of a sample of size n from a population with distribution function G(x).

Standard deviation

standard deviationssample standard deviationSD
For various values of z, the percentage of values expected to lie in and outside the symmetric interval, CI = (−zσ, zσ), are as follows: The mean and the standard deviation of a set of data are descriptive statistics usually reported together. In a certain sense, the standard deviation is a "natural" measure of statistical dispersion if the center of the data is measured about the mean. This is because the standard deviation from the mean is smaller than from any other point.

Statistical dispersion

dispersionvariabilityspread
Summary statistics. Qualitative variation. Robust measures of scale. Measurement uncertainty.

Seven-number summary

Bowley's seven-figure summaryseven-figure summary
In descriptive statistics, the seven-number summary is a collection of seven summary statistics, and is an extension of the five-number summary. There are two similar, common forms. As with the five-number summary, it can be represented by a modified box plot, adding hatch-marks on the "whiskers" for two of the additional numbers. The following percentiles are evenly spaced under a normally distributed variable: The middle three values – the lower quartile, median, and upper quartile – are the usual statistics from the five-number summary and are the standard values for the box in a box plot.

Statistics

statisticalstatistical analysisstatistician
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features of a collection of information, while descriptive statistics in the mass noun sense is the process of using and analyzing those statistics. Descriptive statistics is distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aims to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution.

Mean

mean valueaveragepopulation mean
Descriptive statistics. Kurtosis. Law of averages. Mean value theorem. Summary statistics. Taylor's law.

Quartile

quartileslower quartilelower and upper quartiles
Summary statistics. Quantile. Quartile – from MathWorld Includes references and compares various methods to compute quartiles. Quartiles – From MathForum.org. Quartiles calculator – simple quartiles calculator. Quartiles – An example how to calculate it.

Statistic

sample statisticempiricalmeasure
Descriptive statistics. Statistical hypothesis testing. Summary statistic. Well-behaved statistic. Parker, Sybil P (editor in chief). "Statistic". McGraw-Hill Dictionary of Scientific and Technical Terms. Fifth Edition. McGraw-Hill, Inc. 1994. ISBN: 0-07-042333-4. Page 1912. DeGroot and Schervish. "Definition of a Statistic". Probability and Statistics. International Edition. Third Edition. Addison Wesley. 2002. ISBN: 0-321-20473-5. Pages 370 to 371.

Central tendency

LocalityLocality (statistics)Measure of central tendency
In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages. The term central tendency dates from the late 1920s.

Skewness

skewedskewskewed distribution
Skewness is a descriptive statistic that can be used in conjunction with the histogram and the normal quantile plot to characterize the data or distribution. Skewness indicates the direction and relative magnitude of a distribution's deviation from the normal distribution. With pronounced skewness, standard statistical inference procedures such as a confidence interval for a mean will be not only incorrect, in the sense that the true coverage level will differ from the nominal (e.g., 95%) level, but they will also result in unequal error probabilities on each side.

Kurtosis

excess kurtosisleptokurticplatykurtic
In probability theory and statistics, kurtosis (from κυρτός, kyrtos or kurtos, meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes the shape of a probability distribution and, like skewness, there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population. Different measures of kurtosis may have different interpretations.

Pearson correlation coefficient

correlation coefficientPearson product-moment correlation coefficientPearson correlation
In statistics, the Pearson correlation coefficient (PCC, pronounced ), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y. According to the Cauchy–Schwarz inequality it has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s and for which the mathematical formula was derived and published by Auguste Bravais in 1844.

Correlation and dependence

correlationcorrelatedcorrelations
These examples indicate that the correlation coefficient, as a summary statistic, cannot replace visual examination of the data. The examples are sometimes said to demonstrate that the Pearson correlation assumes that the data follow a normal distribution, but this is not correct. If a pair (X,Y) of random variables follows a bivariate normal distribution, the conditional mean is a linear function of Y, and the conditional mean is a linear function of X.

Five-number summary

five-number summaries
The five-number summary is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles: In addition to the median of a single set of data there are two related statistics called the upper and lower quartiles. If data are placed in order, then the lower quartile is central to the lower half of the data and the upper quartile is central to the upper half of the data. These quartiles are used to calculate the interquartile range, which helps to describe the spread of the data, and determine whether or not any data points are outliers.

Logic

logicianlogicallogics
Many fundamental logical formalisms are essential to section I.2 on artificial intelligence, for example modal logic and default logic in Knowledge representation formalisms and methods, Horn clauses in logic programming, and description logic. Barwise, J. (1982). Handbook of Mathematical Logic. Elsevier. ISBN: 978-0-08-093364-1. Belnap, N. (1977). "A useful four-valued logic". In Dunn & Eppstein, Modern uses of multiple-valued logic. Reidel: Boston. Bocheński, J.M. (1959). A précis of mathematical logic. Translated from the French and German editions by Otto Bird. D. Reidel, Dordrecht, South Holland. Bocheński, J.M. (1970). A history of formal logic. 2nd Edition.

Interquartile range

inter-quartile rangebelowinterquartile
In descriptive statistics, the interquartile range (IQR), also called the midspread or middle 50%, or technically H-spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q 3 − Q 1. In other words, the IQR is the first quartile subtracted from the third quartile; these quartiles can be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed range, and is a commonly used robust measure of scale. The IQR is a measure of variability, based on dividing a data set into quartiles. Quartiles divide a rank-ordered data set into four equal parts.

Order statistic

order statisticsk'th-smallest of n itemsordered
In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.

Test statistic

Common test statisticst''-test of test statistics
A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. However, a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it is difficult to determine their sampling distribution. Two widely used test statistics are the t-statistic and the F-test. For example, suppose the task is to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail).

Sufficient statistic

sufficient statisticssufficientsufficiency
Stephen Stigler noted in 1973 that the concept of sufficiency had fallen out of favor in descriptive statistics because of the strong dependence on an assumption of the distributional form (see Pitman–Koopman–Darmois theorem below), but remained very important in theoretical work. Roughly, given a set \mathbf{X} of independent identically distributed data conditioned on an unknown parameter \theta, a sufficient statistic is a function whose value contains all the information needed to compute any estimate of the parameter (e.g. a maximum likelihood estimate). Due to the factorization theorem (see below), for a sufficient statistic, the probability density can be written as.