Linear predictor function

In statistics and in machine learning, a linear predictor function is a linear function (linear combination) of a set of coefficients and explanatory variables (independent variables), whose value is used to predict the outcome of a dependent variable. This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers (e.g. logistic regression, perceptrons, support vector machines, and linear discriminant analysis ), as well as in various other models, such as principal component analysis and factor analysis.

Lasso (statistics)

LASSOLasso methodLeast Absolute Shrinkage and Selection Operator
In statistics and machine learning, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. It was originally introduced in geophysics literature in 1986, and later independently rediscovered and popularized in 1996 by Robert Tibshirani, who coined the term and provided further insights into the observed performance.

Educational data mining

education
Learning analytics. Machine learning. Statistics.

Selection bias

selection effectselectionbias
Hence there is a potential bias in the impact record of Earth. Astronomical existential risks might similarly be underestimated due to selection bias, and an anthropic correction has to be introduced. In the general case, selection biases cannot be overcome with statistical analysis of existing data alone, though Heckman correction may be used in special cases. An assessment of the degree of selection bias can be made by examining correlations between exogenous (background) variables and a treatment indicator.

Stephen Fienberg

FienbergFienberg, StephenS. E. Fienberg
Stephen Elliott Fienberg (27 November 1942 – 14 December 2016) was a Professor Emeritus (formerly the Maurice Falk University Professor of Statistics and Social Science) in the Department of Statistics, the Machine Learning Department, Heinz College, and Cylab at Carnegie Mellon University. Born in Toronto, Ontario, Fienberg earned a Bachelor of Science degree in Mathematics and Statistics from the University of Toronto in 1964, a Master of Arts degree in Statistics in 1965, and a Ph.D. in Statistics in 1968 from Harvard University for research supervised by Frederick Mosteller.

ENSAE ParisTech

ENSAEÉcole Nationale de la Statistique et de l'Administration ÉconomiqueEcole Nationale de la Statistique et de l'Administration Economique
ENSAE Paris is known as the branch school of École Polytechnique for statistics, data science and machine learning. It is one of France's top schools of economics, statistics, data science and machine learning and is directly attached to France's Institut national de la statistique et des études économiques (INSEE) and the French Ministry of Economy and Finance. Students are given a proficient training both in economics and statistics and they can specialize in macroeconomics, microeconomics, statistics or finance. The ENSAE has the ability to train its students for the French actuary graduation (Institut des Actuaires).

Statistical learning theory

Learning theory (statistics)statistical machine learning
Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics, and baseball. The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood.

Decision tree learning

decision treesdecision treeClassification and regression tree
In statistics, Decision tree learning uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.

Danielle Belgrave

She develops statistical machine learning models to look at disease progression in an effort to design new management strategies and understand heterogeneity. Statistical learning methods can inform the management of medical conditions by providing a framework for endotype discovery using probabilistic modelling. She uses statistical models to identify the underlying endotypes of a condition from a set of phenotypes. She studied whether atopic march, the progression of allergic diseases from early life, adequately describes atopic diseases like eczema in early life.

Probability

probabilisticprobabilitieschance
These concepts have been given an axiomatic mathematical formalization in probability theory, which is used widely in such areas of study as mathematics, statistics, finance, gambling, science (in particular physics), artificial intelligence/machine learning, computer science, game theory, and philosophy to, for example, draw inferences about the expected frequency of events. Probability theory is also used to describe the underlying mechanics and regularities of complex systems.

Boosting (machine learning)

boostingBoosting (meta-algorithm)boosted
Schapire (1997); A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1):119-139. Robert E. Schapire and Yoram Singer (1999); Improved Boosting Algorithms Using Confidence-Rated Predictors, Machine Learning, 37(3):297-336. Robert E. Schapire (2003); The Boosting Approach to Machine Learning: An Overview, MSRI (Mathematical Sciences Research Institute) Workshop on Nonlinear Estimation and Classification. Zhou Zhi-Hua (2014) Boosting 25 years, CCL 2014 Keynote.

Online machine learning

online learningon-line learningonline
In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update our best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms.

Hierarchical Dirichlet process

Hierarchical Dirichlet Process Mixture Model
In statistics and machine learning, the hierarchical Dirichlet process (HDP) is a nonparametric Bayesian approach to clustering grouped data. It uses a Dirichlet process for each group of data, with the Dirichlet processes for all groups sharing a base distribution which is itself drawn from a Dirichlet process. This method allows groups to share statistical strength via sharing of clusters across groups. The base distribution being drawn from a Dirichlet process is important, because draws from a Dirichlet process are atomic probability measures, and the atoms will appear in all group-level Dirichlet processes.

Discretization of continuous features

DiscretizationdiscretizeFayyad & Irani's MDL method
In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of discretization in general and also of binning, as in making a histogram. Whenever continuous data is discretized, there is always some amount of discretization error. The goal is to reduce the amount to a level considered negligible for the modeling purposes at hand.

Master in Data Science

As an area of expertise and field, data science is defined as a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization. The degree is relatively new, with graduate schools, business schools, and data science centers often housing the programs.

Smoothed analysis

smoothed
In theoretical computer science, smoothed analysis is a way of measuring the complexity of an algorithm. Since its introduction in 2001, smoothed analysis has been used as a basis for considerable research, for problems ranging from mathematical programming, numerical analysis, machine learning, and data mining. It can give a more realistic analysis of the practical performance of the algorithm, such as its running time, than using worst-case or average-case scenarios. Smoothed analysis is a hybrid of worst-case and average-case analyses that inherits advantages of both. It measures the expected performance of algorithms under slight random perturbations of worst-case inputs.

Learning rate

adaptive learning ratestep size
In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns." The learning rate is often denoted by the character η or α. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting.

Social media mining

social media marketing
It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma. Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions.

Naive Bayes classifier

Naive Bayesnaive Bayes classificationNaïve Bayes
Naive Bayes classifiers are available in many general-purpose machine learning and NLP packages, including Apache Mahout, Mallet, NLTK, Orange, scikit-learn and Weka. IMSL Numerical Libraries Collections of math and statistical algorithms available in C/C++, Fortran, Java and C#/.NET. Data mining routines in the IMSL Libraries include a Naive Bayes classifier. An interactive Microsoft Excel spreadsheet Naive Bayes implementation using VBA (requires enabled macros) with viewable source code. jBNC - Bayesian Network Classifier Toolbox. Statistical Pattern Recognition Toolbox for Matlab. ifile - the first freely available (Naive) Bayesian mail/spam filter.

Andrew Ng

NgDr. Andrew Ng
Machine Learning (CS 229) Video Lecture. Lecture videos. https://www.youtube.com/watch?v=QfhraNhghjw. http://cs229.stanford.edu/ (CS 229, perhaps the most popular introductory machine learning course in the world).

Mathematical Reviews

MCQMathSciNetCurrent Mathematical Publications
Mathematical Reviews is a journal published by the American Mathematical Society (AMS) that contains brief synopses, and in some cases evaluations, of many articles in mathematics, statistics, and theoretical computer science. The AMS also publishes an associated online bibliographic database called MathSciNet which contains an electronic version of Mathematical Reviews and additionally contains citation information for over 3.5 million items as of 2018. Mathematical Reviews was founded by Otto E.

Kernel embedding of distributions

Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. Let X denote a random variable with codomain \Omega and distribution P(X).

Categorical distribution

categoricalcategorical probability distributioncategorical variable
In some fields, such as machine learning and natural language processing, the categorical and multinomial distributions are conflated, and it is common to speak of a "multinomial distribution" when a "categorical distribution" would be more precise. This imprecise usage stems from the fact that it is sometimes convenient to express the outcome of a categorical distribution as a "1-of-K" vector (a vector with one element containing a 1 and all other elements containing a 0) rather than as an integer in the range 1 to K; in this form, a categorical distribution is equivalent to a multinomial distribution for a single observation (see below).

Geoffrey McLachlan

McLachlan, G.J.
Geoffrey John McLachlan FAA (born 1946) is an Australian researcher in computational statistics, machine learning and pattern recognition. McLachlan is best known for his work in classification and finite mixture models. He is the joint author of five influential books on the topics of mixtures and classification, as well as their applications. Currently, McLachlan is a Professor of statistics within the School of Mathematics and Physics at the University of Queensland. McLachlan is a prolific author in the fields of computational statistics, pattern recognition, machine learning, and neural networks. He has written over 280 research articles.

Vladimir Vapnik

VapnikV. VapnikVapnik, Vladimir
The Nature of Statistical Learning Theory, 1995. Statistical Learning Theory (1998). Wiley-Interscience, ISBN: 0-471-03003-1. Estimation of Dependences Based on Empirical Data, Reprint 2006 (Springer), also contains a philosophical essay on Empirical Inference Science, 2006. Photograph of Professor Vapnik. Vapnik's brief biography from the Computer Learning Research Centre, Royal Holloway.