# Statistics

**statisticalstatistical analysisstatisticianstatistical methodsapplied statisticsstatisticallystatistical datastatistical methodstatistical analysesquantitative analysis**

Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.wikipedia

3,907 Related Articles

### Survey methodology

**surveysurveysstatistical survey**

Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.

A field of applied statistics of human research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys.

### Statistical population

**populationsubpopulationsubpopulations**

In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied.

In statistics, a population is a set of similar items or events which is of interest for some question or experiment.

### Sample (statistics)

**samplesamplesstatistical sample**

When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples.

In statistics and quantitative research methodology, a data sample is a set of data collected and the world selected from a statistical population by a defined procedure.

### Standard deviation

**standard deviationssample standard deviationsigma**

Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).

In statistics, the standard deviation (SD, also represented by the lower case Greek letter sigma σ or the Latin letter s) is a measure that is used to quantify the amount of variation or dispersion of a set of data values.

### Mean

**mean valuepopulation meanaverage**

Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).

There are several kinds of mean in various branches of mathematics (especially statistics).

### Central tendency

**Localitycentral locationcentral point**

Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other.

In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.

### Statistical dispersion

**dispersionvariabilityspread**

Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other.

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.

### Observational study

**observational studiesobservationalobservational data**

In contrast, an observational study does not involve experimental manipulation. While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.

In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints.

### Bias (statistics)

**biasbiasedstatistical bias**

Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important.

Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated.

### Missing data

**missing valuesincomplete datamissing at random**

The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation.

### Censoring (statistics)

**censoringcensoredcensored data**

The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.

In statistics, engineering, economics, and medical research, censoring is a condition in which the value of a measurement or observation is only partially known.

### Estimation theory

**parameter estimationestimationestimated**

These inferences may take the form of: answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation) and modeling relationships within the data (for example, using regression analysis).

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component.

### Mathematics

**mathematicalmathmathematician**

Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.

Perhaps the foremost mathematician of the 19th century was the German mathematician Carl Friedrich Gauss, who made numerous contributions to fields such as algebra, analysis, differential geometry, matrix theory,number theory, and statistics.

### Glossary of probability and statistics

See glossary of probability and statistics.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

### Data mining

**data-miningdataminingdata mine**

Inference can extend to forecasting, prediction and estimation of unobserved values either in or associated with the population being studied; it can include extrapolation and interpolation of time series or spatial data, and can also include data mining.

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

### Time series

**time series analysistime-seriestime-series analysis**

Inference can extend to forecasting, prediction and estimation of unobserved values either in or associated with the population being studied; it can include extrapolation and interpolation of time series or spatial data, and can also include data mining.

Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and largely in any domain of applied science and engineering which involves temporal measurements.

### Survey sampling

**surveysSample SurveyMethodology for Collecting, Estimating, and Organizing Microeconomic Data**

When full census data cannot be collected, statisticians collect sample data by developing specific experiment designs and survey samples.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey.

### Sampling distribution

**distributionsampling**

Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.

In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic.

### Prediction

**predictpredictionspredictive**

Inference can extend to forecasting, prediction and estimation of unobserved values either in or associated with the population being studied; it can include extrapolation and interpolation of time series or spatial data, and can also include data mining.

In statistics, prediction is a part of statistical inference.

### Probability theory

**theory of probabilityprobabilityprobability theorist**

Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

As a mathematical foundation for statistics, probability theory is essential to many human activities that involve quantitative analysis of data.

### Statistical theory

**statisticalstatistical theoriesmathematical statistics**

Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.

### Instrumental variables estimation

**instrumental variableinstrumental variables2SLS**

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment.

### Difference in differences

**difference-in-differencedifference-in-differencesAssumptions**

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.

Difference in differences (DID or DD ) is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment.

### Decision theory

**decision sciencestatistical decision theorydecision sciences**

Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.

Empirical applications of this rich theory are usually done with the help of statistical and econometric methods, especially via the so-called choice models, such as probit and logit models.

### Consistent estimator

**consistentconsistencyinconsistent**

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ 0 —having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ 0 . This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ 0 converges to one.