Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P(A \mid B), and is read "the probability of A, given B". It is defined by : If P(B)=0 then P(A \mid B) is formally undefined by this expression. However, it is possible to define a conditional probability for some zero-probability events using a σ-algebra of such events (such as those arising from a continuous random variable).

Bayesian statistics

BayesianBayesian methodsBayesian analysis
Bayes' theorem describes the conditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event. For example, in Bayesian inference, Bayes' theorem can be used to estimate the parameters of a probability distribution or statistical model. Since Bayesian statistics treats probability as a degree of belief, Bayes' theorem can directly assign a probability distribution that quantifies the belief to the parameter or set of parameters. Bayesian statistics was named after Thomas Bayes, who formulated a specific case of Bayes' theorem in his paper published in 1763.

Beta distribution

beta betabeta of the first kind
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution. It is a special case of the Dirichlet distribution. The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial and geometric distributions.

Bayesian probability

Bayesiansubjective probabilityBayesianism
The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability. The term Bayesian refers to Thomas Bayes (1702–1761), who proved a special case of what is now called Bayes' theorem in a paper titled "An Essay towards solving a Problem in the Doctrine of Chances". In that special case, the prior and posterior distributions were Beta distributions and the data came from Bernoulli trials. It was Pierre-Simon Laplace (1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics, medical statistics, reliability, and jurisprudence.

Binomial distribution

binomialbinomial modelBinomial probability
Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results. This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738.

Normal distribution

normally distributedGaussian distributionnormal
Some authors attribute the credit for the discovery of the normal distribution to de Moivre, who in 1738 published in the second edition of his "The Doctrine of Chances" the study of the coefficients in the binomial expansion of (a + b) n. De Moivre proved that the middle term in this expansion has the approximate magnitude of, and that "If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term distant from the middle by the Interval ℓ, has to the middle Term, is ."

Marginal distribution

marginal probabilitymarginalmarginals
,Xn are discrete random variables, then the marginal probability mass function should be ; if X 1,X 2,...Xn are continuous random variables, then the marginal probability density function should be . * Compound probability distribution. Joint probability distribution. Marginal likelihood. Wasserstein metric. Conditional distribution.


statisticalstatistical analysisstatistician
However, it is true that, before any data are sampled and given a plan for how to construct the confidence interval, the probability is 95% that the yet-to-be-calculated interval will cover the true value: at this point, the limits of the interval are yet-to-be-observed random variables. One approach that does yield an interval that can be interpreted as having a given probability of containing the true value is to use a credible interval from Bayesian statistics: this approach depends on a different way of interpreting what is meant by "probability", that is as a Bayesian probability. In principle confidence intervals can be symmetrical or asymmetrical.

History of statistics

foundational advanceshistorian of statisticsstat-'' etymology
The first example of what later became known as the normal curve was studied by Abraham de Moivre who plotted this curve on November 12, 1733. de Moivre was studying the number of heads that occurred when a 'fair' coin was tossed. In 1761 Thomas Bayes proved Bayes' theorem and in 1765 Joseph Priestley invented the first timeline charts. Johann Heinrich Lambert in his 1765 book Anlage zur Architectonic proposed the semicircle as a distribution of errors: with -1 < x < 1. Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities.

List of statistics articles

List of statistical topicsList of statistics topicsIndex of statistics articles
An Essay towards solving a Problem in the Doctrine of Chances. Estimating equations. Estimation theory. Estimation of covariance matrices. Estimation of signal parameters via rotational invariance techniques. Estimator. Etemadi's inequality. Ethical problems using children in clinical trials. Event (probability theory). Event study. Evidence lower bound. Evidence under Bayes theorem. Evolutionary data mining. Ewens's sampling formula. EWMA chart. Exact statistics. Exact test. Examples of Markov chains. Excess risk. Exchange paradox. Exchangeable random variables. Expander walk sampling. Expectation–maximization algorithm. Expectation propagation. Expected utility hypothesis. Expected value.

Probability interpretations

philosophy of probabilityinterpretation of probabilityinterpretations of probability
Likewise, when it is written that "the most probable explanation" of the name of Ludlow, Massachusetts "is that it was named after Roger Ludlow", what is meant here is not that Roger Ludlow is favored by a random factor, but rather that this is the most plausible explanation of the evidence, which admits other, less likely explanations. Thomas Bayes attempted to provide a logic that could handle varying degrees of confidence; as such, Bayesian probability is an attempt to recast the representation of probabilistic statements as an expression of the degree of confidence by which the beliefs they express are held.

Event (probability theory)

eventeventsrandom event
Under this definition, any subset of the sample space that is not an element of the σ-algebra is not an event, and does not have a probability. With a reasonable specification of the probability space, however, all events of interest are elements of the σ-algebra. Even though events are subsets of some sample space Ω, they are often written as predicates or indicators involving random variables.

Richard Jeffrey

Richard C. JeffreyJeffreyJeffrey, R. C.
In frequentist statistics, Bayes' theorem provides a useful rule for updating a probability when new frequency data becomes available. In Bayesian statistics, the theorem itself plays a more limited role. Bayes' theorem connects probabilities that are held simultaneously. It does not tell the learner how to update probabilities when new evidence becomes available over time. This subtlety was first pointed out in terms by Ian Hacking in 1967. However, adapting Bayes' theorem, and adopting it as a rule of updating, is a temptation. Suppose that a learner forms probabilities P old (A&B)=p and P old (B)=q.

Uniform distribution (continuous)

uniform distributionuniformuniformly distributed
In graphical representation of uniform distribution function [f(x) vs x], the area under the curve within the specified bounds displays the probability (shaded area is depicted as a rectangle). For this specific example above, the base would be (18-2) and the height would be (1/23). For random variable X X~U(0,23) Find P(2 < X | X > 18): P(X > 12 | X > 8) = (23-12)*(1/(23-8))=11/15. The example above is for a conditional probability case for the uniform distribution: given X > 8 is true, what is the probability that X > 12. Conditional probability changes the sample space so a new interval length (b-a) has to be calculated, where b is 23 and a is 8.

Timeline of machine learning

This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events are included.

Bayesian inference in marketing

Bayesian theory in marketing
The fundamental ideas and concepts behind Bayes' theorem, and its use within Bayesian inference, have been developed and added to over the past centuries by Thomas Bayes, Richard Price and Pierre Simon Laplace as well as numerous other mathematicians, statisticians and scientists. Bayesian inference has experienced spikes in popularity as it has been seen as vague and controversial by rival frequentist statisticians. In the past few decades Bayesian inference has become widespread in many scientific and social science fields such as marketing. Bayesian inference allows for decision making and market research evaluation under uncertainty and limited data.

Catalog of articles in probability theory

Conditioning / (2:BDCR). Bayes' theorem / (2:BCG). Borel–Kolmogorov paradox / iex (2:CM). Conditional expectation / (2:BDR). Conditional independence / (3F:BR). Conditional probability. Conditional probability distribution / (2:DC). Conditional random field / (F:R). Disintegration theorem / anl (2:G). Inverse probability / Bay. Luce's choice axiom. Regular conditional probability / (2:G). Rule of succession / (F:B). Binomial distribution / (1:D). (a,b,0) class of distributions / (1:D). Anscombe transform. Bernoulli distribution / (1:B). Beta distribution / (1:C). Bose–Einstein statistics / (F:D). Cantor distribution / (1:C). Cauchy distribution / (1:C).

Probability space

probability measuresGaussian measureoutcomes
The first major treatise blending calculus with probability theory, originally in French: Théorie Analytique des Probabilités. Andrei Nikolajevich Kolmogorov (1950) Foundations of the Theory of Probability. The modern measure-theoretic foundation of probability theory; the original German version (Grundbegriffe der Wahrscheinlichkeitrechnung) appeared in 1933. Harold Jeffreys (1939) The Theory of Probability. An empiricist, Bayesian approach to the foundations of probability theory. Edward Nelson (1987) Radically Elementary Probability Theory.

Posterior probability

posterior distributionposteriorposterior probability distribution
Every Bayes theorem problem can be solved in this way. The posterior probability distribution of one random variable given the value of another can be calculated with Bayes' theorem by multiplying the prior probability distribution by the likelihood function, and then dividing by the normalizing constant, as follows: gives the posterior probability density function for a random variable X given the data Y=y, where Posterior probability is a conditional probability conditioned on randomly observed data. Hence it is a random variable. For a random variable, it is important to summarize its amount of uncertainty.

Statistical inference

inferential statisticsinferenceinferences
(Methods of prior construction which do not require external input have been [[Bayesian probability#Personal probabilities and objective methods for constructing priors|proposed]] but not yet fully developed.) Formally, Bayesian inference is calibrated with reference to an explicitly stated utility, or loss function; the 'Bayes rule' is the one which maximizes expected utility, averaged over the posterior uncertainty. Formal Bayesian inference therefore automatically provides optimal decisions in a decision theoretic sense.

Sample space

event spacespacerepresented by points
Probability space. Space (mathematics). Set (mathematics). Event (probability theory). σ-algebra.

Central limit theorem

Lyapunov's central limit theoremlimit theoremscentral limit
The earliest version of this theorem, that the normal distribution may be used as an approximation to the binomial distribution, is now known as the de Moivre–Laplace theorem. In more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. They all express the fact that a sum of many independent and identically distributed (i.i.d.) random variables, or alternatively, random variables with specific types of dependence, will tend to be distributed according to one of a small set of attractor distributions. When the variance of the i.i.d. variables is finite, the attractor distribution is the normal distribution.

Likelihood function

likelihoodlikelihood ratiolog-likelihood
In Bayesian inference, although one can speak about the likelihood of any proposition or random variable given another random variable: for example the likelihood of a parameter value or of a statistical model (see marginal likelihood), given specified data or other evidence, the likelihood function remains the same entity, with the additional interpretations of (i) a conditional density of the data given the parameter (since the parameter is then a random variable) and (ii) a measure or amount of information brought by the data about the parameter value or even the model.

Poisson distribution

The work theorized about the number of wrongful convictions in a given country by focusing on certain random variables N that count, among other things, the number of discrete occurrences (sometimes called "events" or "arrivals") that take place during a time-interval of given length. The result had already been given in 1711 by Abraham de Moivre in De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus. This makes it an example of Stigler's law and it has prompted some authors to argue that the Poisson distribution should bear the name of de Moivre. In 1860, Simon Newcomb fitted the Poisson distribution to the number of stars found in a unit of space.