# Regression analysis

**regressionmultiple regressionregression modelmultiple regression analysisregression modelsregression analysesregression equationregressionsregressingregression algorithms**

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features').wikipedia

659 Related Articles

### Quantile regression

**quantileQuantile Regressions**

Less common forms of regression use slightly different procedures to estimate alternative location parameters (e.g., quantile regression or Necessary Condition Analysis ) or estimate the conditional expectation across a broader collection of non-linear models (e.g., nonparametric regression).

Quantile regression is a type of regression analysis used in statistics and econometrics.

### Nonparametric regression

**non-parametric regressionnonparametricnon-parametric**

Less common forms of regression use slightly different procedures to estimate alternative location parameters (e.g., quantile regression or Necessary Condition Analysis ) or estimate the conditional expectation across a broader collection of non-linear models (e.g., nonparametric regression). In recent decades, new methods have been developed for robust regression, regression involving correlated responses such as time series and growth curves, regression in which the predictor (independent variable) or response variables are curves, images, graphs, or other complex data objects, regression methods accommodating various types of missing data, nonparametric regression, Bayesian methods for regression, regression in which the predictor variables are measured with error, regression with more predictor variables than observations, and causal inference with regression.

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data.

### Machine learning

**machine-learninglearningstatistical learning**

First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning.

Classification algorithms and regression algorithms are types of supervised learning.

### Prediction

**predictpredictionspredictive**

First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning.

Statistical techniques used for prediction include regression analysis and its various sub-categories such as linear regression, generalized linear models (logistic regression, Poisson regression, Probit regression), etc. In case of forecasting, autoregressive moving average models and vector autoregression models can be utilized.

### Robust regression

**robust estimationRobustrobust linear model**

In recent decades, new methods have been developed for robust regression, regression involving correlated responses such as time series and growth curves, regression in which the predictor (independent variable) or response variables are curves, images, graphs, or other complex data objects, regression methods accommodating various types of missing data, nonparametric regression, Bayesian methods for regression, regression in which the predictor variables are measured with error, regression with more predictor variables than observations, and causal inference with regression.

In robust statistics, robust regression is a form of regression analysis designed to overcome some limitations of traditional parametric and non-parametric methods.

### Forecasting

**forecastforecastsprojection**

First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning.

### Least squares

**least-squaresmethod of least squaresleast squares method**

The earliest form of regression was the method of least squares, which was published by Legendre in 1805, and by Gauss in 1809.

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals made in the results of every single equation.

### Time series

**time series analysistime-seriestime-series analysis**

In recent decades, new methods have been developed for robust regression, regression involving correlated responses such as time series and growth curves, regression in which the predictor (independent variable) or response variables are curves, images, graphs, or other complex data objects, regression methods accommodating various types of missing data, nonparametric regression, Bayesian methods for regression, regression in which the predictor variables are measured with error, regression with more predictor variables than observations, and causal inference with regression.

While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called "time series analysis", which focuses on comparing values of a single time series or multiple dependent time series at different points in time.

### Regression toward the mean

**regression to the meanRegression towards the meanmean regression**

The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean).

If its parents are each two inches taller than the averages for men and women, then, on average, the offspring will be shorter than its parents by some factor (which, today, we would call one minus the regression coefficient) times two inches.

### Causality

**causalcause and effectcausation**

Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables.

The body of statistical techniques involves substantial use of regression analysis.

### Errors-in-variables models

**Errors-in-variables modelerrors-in-variableserrors in variables**

For example, modeling errors-in-variables can lead to reasonable estimates independent variables are measured with errors.

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables.

### Newey–West estimator

**Newey–West HAC estimatorNewey-West estimatorNewey–West**

Correlated errors that exist within subsets of the data or follow specific patterns can be handled using clustered standard errors, geographic weighted regression, or Newey–West standard errors, among other techniques.

A Newey–West estimator is used in statistics and econometrics to provide an estimate of the covariance matrix of the parameters of a regression-type model when this model is applied in situations where the standard assumptions of regression analysis do not apply.

### Karl Pearson

**PearsonPearson, KarlCarl Pearson**

For Galton, regression had only this biological meaning, but his work was later extended by Udny Yule and Karl Pearson to a more general statistical context.

These techniques, which are widely used today for statistical analysis, include the chi-squared test, standard deviation, and correlation and regression coefficients.

### Linear regression

**regression coefficientmultiple linear regressionregression**

The most common form of regression analysis is linear regression, in which a researcher finds the line (or a more complex linear function) that most closely fits the data according to a specific mathematical criterion.

Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

### Francis Galton

**Sir Francis GaltonGaltonGalton, Francis**

The term "regression" was coined by Francis Galton in the nineteenth century to describe a biological phenomenon.

He also discovered the properties of the bivariate normal distribution and its relationship to regression analysis.

### Standard error

**SEstandard errorsstandard error of the mean**

The standard errors of the parameter estimates are given by

In regression analysis, the term "standard error" refers either to the square root of the reduced chi-squared statistic or the standard error for a particular regression coefficient (as used in, e.g., confidence intervals).

### Probit model

**probit regressionprobitBayesian probit regression**

Nonlinear models for binary dependent variables include the probit and logit model.

In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married.

### Logistic regression

**logit modellogisticlogistic model**

Nonlinear models for binary dependent variables include the probit and logit model.

In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary regression).

### Covariance matrix

**variance-covariance matrixcovariance matricescovariance**

The matrix is known as the matrix of regression coefficients, while in linear algebra is the Schur complement of in.

### Censored regression model

**Censoredcensored regressionCensoring**

Censored regression models may be used when the dependent variable is only sometimes observed, and Heckman correction type models may be used when the sample is not randomly selected from the population of interest.

Censored regression models are a class of models in which the dependent variable is censored above or below a certain threshold.

### Confidence interval

**confidence intervalsconfidence levelconfidence**

Under the further assumption that the population error term is normally distributed, the researcher can use these estimated standard errors to create confidence intervals and conduct hypothesis tests about the population parameters.

Confidence and prediction bands are often used as part of the graphical presentation of results of a regression analysis.

### Econometrics

**econometriceconometricianeconometric analysis**

The subfield of econometrics is largely focused on developing techniques that allow researchers to make reasonable real-world conclusions in real-world settings, where classical assumptions do not hold exactly.

One of the fundamental statistical methods used by econometricians is regression analysis.

### Mean squared error

**mean square errorsquared error lossMSE**

This is called the mean square error (MSE) of the regression.

In regression analysis, plotting is a more natural way to view the overall trend of the whole data.

### Normal distribution

**normally distributedGaussian distributionnormal**

In the work of Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to be Gaussian.

In regression analysis, lack of normality in residuals simply indicates that the model postulated is inadequate in accounting for the tendency in the data and needs to be augmented; in other words, normality in residuals can always be achieved given a properly constructed model.

### Residual sum of squares

**sum of squared residualssum of squares of residualsresidual sum-of-squares**

This method obtains parameter estimates that minimize the sum of squared residuals, SSR:

In a standard linear simple regression model, where a and b are coefficients, y and x are the regressand and the regressor, respectively, and ε is the error term.