Statistics

statisticalstatistical analysisstatisticianstatistical methodsapplied statisticsstatisticallystatistical datastatistical methodstatistical analysesquantitative analysis
Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.wikipedia
3,907 Related Articles

Survey methodology

surveysurveysstatistical survey
Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.
A field of applied statistics of human research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys.

Statistical population

populationsubpopulationsubpopulations
In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied.
In statistics, a population is a set of similar items or events which is of interest for some question or experiment.

Sample (statistics)

samplesamplesstatistical sample
When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples.
In statistics and quantitative research methodology, a data sample is a set of data collected and the world selected from a statistical population by a defined procedure.

Standard deviation

standard deviationssample standard deviationsigma
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).
In statistics, the standard deviation (SD, also represented by the lower case Greek letter sigma σ or the Latin letter s) is a measure that is used to quantify the amount of variation or dispersion of a set of data values.

Mean

mean valuepopulation meanaverage
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).
There are several kinds of mean in various branches of mathematics (especially statistics).

Central tendency

Localitycentral locationcentral point
Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other.
In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.

Statistical dispersion

dispersionvariabilityspread
Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other.
In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.

Observational study

observational studiesobservationalobservational data
In contrast, an observational study does not involve experimental manipulation. While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.
In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints.

Bias (statistics)

biasbiasedstatistical bias
Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important.
Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated.

Missing data

missing valuesincomplete datamissing at random
The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation.

Censoring (statistics)

censoringcensoredcensored data
The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.
In statistics, engineering, economics, and medical research, censoring is a condition in which the value of a measurement or observation is only partially known.

Estimation theory

parameter estimationestimationestimated
These inferences may take the form of: answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation) and modeling relationships within the data (for example, using regression analysis).
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component.

Mathematics

mathematicalmathmathematician
Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.
Perhaps the foremost mathematician of the 19th century was the German mathematician Carl Friedrich Gauss, who made numerous contributions to fields such as algebra, analysis, differential geometry, matrix theory,number theory, and statistics.

Glossary of probability and statistics

See glossary of probability and statistics.
The following is a glossary of terms used in the mathematical sciences statistics and probability.

Data mining

data-miningdataminingdata mine
Inference can extend to forecasting, prediction and estimation of unobserved values either in or associated with the population being studied; it can include extrapolation and interpolation of time series or spatial data, and can also include data mining.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Time series

time series analysistime-seriestime-series analysis
Inference can extend to forecasting, prediction and estimation of unobserved values either in or associated with the population being studied; it can include extrapolation and interpolation of time series or spatial data, and can also include data mining.
Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and largely in any domain of applied science and engineering which involves temporal measurements.

Survey sampling

surveysSample SurveyMethodology for Collecting, Estimating, and Organizing Microeconomic Data
When full census data cannot be collected, statisticians collect sample data by developing specific experiment designs and survey samples.
In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey.

Sampling distribution

distributionsampling
Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.
In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic.

Prediction

predictpredictionspredictive
Inference can extend to forecasting, prediction and estimation of unobserved values either in or associated with the population being studied; it can include extrapolation and interpolation of time series or spatial data, and can also include data mining.
In statistics, prediction is a part of statistical inference.

Probability theory

theory of probabilityprobabilityprobability theorist
Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.
As a mathematical foundation for statistics, probability theory is essential to many human activities that involve quantitative analysis of data.

Statistical theory

statisticalstatistical theoriesmathematical statistics
Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.

Instrumental variables estimation

instrumental variableinstrumental variables2SLS
While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment.

Difference in differences

difference-in-differencedifference-in-differencesAssumptions
While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.
Difference in differences (DID or DD ) is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment.

Decision theory

decision sciencestatistical decision theorydecision sciences
Probability is used in mathematical statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical procedures.
Empirical applications of this rich theory are usually done with the help of statistical and econometric methods, especially via the so-called choice models, such as probit and logit models.

Consistent estimator

consistentconsistencyinconsistent
While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data—like natural experiments and observational studies —for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation and instrumental variables, among many others) that produce consistent estimators.
In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ 0 —having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ 0 . This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ 0 converges to one.