Google Inc.Google, Inc.Googling
DeepMind describes itself as having the ability to combine the best techniques from machine learning and systems neuroscience to build general-purpose learning algorithms. DeepMind's first commercial applications were used in simulations, e-commerce and games. it was reported that DeepMind had roughly 75 employees. Technology news website Recode reported that the company was purchased for $400 million though it was not disclosed where the information came from. A Google spokesman would not comment of the price. The purchase of DeepMind aids in Google's recent growth in the artificial intelligence and robotics community.

Statistical classification

List of datasets for machine learning research. Machine learning. Recommender system.

Unsupervised learning

unsupervisedunsupervised classificationunsupervised machine learning
Unsupervised learning is a branch of machine learning that learns from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data. Alternatives include supervised learning and reinforcement learning. A central application of unsupervised learning is in the field of density estimation in statistics, though unsupervised learning encompasses many other domains involving summarizing and explaining data features.

Dimensionality reduction

dimension reductionreduce the dimensionalitydimensional reduction
Linear discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. GDA deals with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the support vector machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high-dimensional feature space.

Reinforcement learning

reward functionreinforcementapproximate dynamic programming
Most reinforcement learning papers are published at the major machine learning and AI conferences (ICML, NIPS, AAAI, IJCAI, UAI, AI and Statistics) and journals ( JAIR, JMLR, Machine learning journal, IEEE T-CIAIG). Some theory papers are published at COLT and ALT. However, many papers appear in robotics conferences (IROS, ICRA) and the "agent" conference AAMAS. Operations researchers publish their papers at the INFORMS conference and, for example, in the Operation Research, and the Mathematics of Operations Research journals.

Loss function

objective functioncost functionrisk function
In mathematical optimization, statistics, econometrics, decision theory, machine learning and computational neuroscience, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (in specific domains, variously called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized.

Bias–variance tradeoff

bias-variance dilemmabias-variance tradeoffvariance
In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.

Michael I. Jordan

JordanMichael JordanJordan, Michael I.
Jordan is currently a full professor at the University of California, Berkeley where his appointment is split across the Department of Statistics and the Department of EECS. He was a professor at the Department of Brain and Cognitive Sciences at MIT from 1988 to 1998. In the 1980s Jordan started developing recurrent neural networks as a cognitive model. In recent years, his work is less driven from a cognitive perspective and more from the background of traditional statistics. Jordan popularised Bayesian networks in the machine learning community and is known for pointing out links between machine learning and statistics.

Missing data

missing valuesincomplete datamissing at random
MAR is an assumption that is impossible to verify statistically, we must rely on its substantive reasonableness. An example is that males are less likely to fill in a depression survey but this has nothing to do with their level of depression, after accounting for maleness. Depending on the analysis method, these data can still induce parameter bias in analyses due to the contingent emptiness of cells (male, very high depression may have zero entries). However, if the parameter is estimated with Full Information Maximum Likelihood, MAR will provide asymptotically unbiased estimates.

Observational study

observational studiesobservationalobservational data
A major challenge in conducting observational studies is to draw inferences that are acceptably free from influences by overt biases, as well as to assess the influence of potential hidden biases. An observer of an uncontrolled experiment (or process) records potential factors and the data output: the goal is to determine the effects of the factors. Sometimes the recorded factors may not be directly causing the differences in the output. There may be more important factors which were not recorded but are, in fact, causal. Also, recorded or unrecorded factors may be correlated which may yield incorrect conclusions.


Statistical techniques used for prediction include regression analysis and its various sub-categories such as linear regression, generalized linear models (logistic regression, Poisson regression, Probit regression), etc. In case of forecasting, autoregressive moving average models and vector autoregression models can be utilized. When these and/or related, generalized set of regression or machine learning methods are deployed in commercial usage, the field is known as predictive analytics. In many applications, such as time series analysis, it is possible to estimate the models that generate the observations.

Survey sampling

surveysSample SurveyMethodology for Collecting, Estimating, and Organizing Microeconomic Data
This selection bias would be corrected by applying a survey weight equal to [1/(# of phone numbers)] to each household. Self-selection bias: A type of bias in which individuals voluntarily select themselves into a group, thereby potentially biasing the response of that group. Participation bias: Bias that arises due to the characteristics of those who choose to participate in a survey or poll. Coverage bias: Coverage bias can occur when population members do not appear in the sample frame (undercoverage). Coverage bias occurs when the observed value deviates from the population parameter due to differences between covered and non-covered units.

Action selection

These egocentric sorts of actions may in turn result in modifying the agents basic behavioural capacities, particularly in that updating memory implies some form of machine learning is possible. Ideally, action selection itself should also be able to learn and adapt, but there are many problems of combinatorial complexity and computational tractability that may require restricting the search space for learning. In AI, an ASM is also sometimes either referred to as an agent architecture or thought of as a substantial part of one.

Similarity learning

learned from datametric learning
Some well-known approaches for metric learning include Large margin nearest neighbor, Information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects.

Exploratory data analysis

explorative data analysisexploratorydata analysis
Orange, an open-source data mining and machine learning software suite. Python, an open-source programming language widely used in data mining and machine learning. R, an open-source programming language for statistical computing and graphics. Together with Python one of the most popular languages for Data-Science. SOCR provides a large number of free online tools. TinkerPlots an EDA software for upper elementary and middle school students. Weka an open source data mining package that includes visualization and EDA tools such as targeted projection pursuit. Anscombe's quartet, on importance of exploration. Data dredging. Predictive analytics. Structured data analysis (statistics).


International Business MachinesIBM CorporationInternational Business Machines Corporation
IT outsourcing also represents a major service provided by IBM, with more than 40 data centers worldwide. alphaWorks is IBM's source for emerging software technologies, and SPSS is a software package used for statistical analysis. IBM's Kenexa suite provides employment and retention solutions, and includes the BrassRing, an applicant tracking system used by thousands of companies for recruiting. IBM also owns The Weather Company, which provides weather forecasting and includes and Weather Underground.

Computational anatomy

AnatomyComputational anatomy (CA)Diffeomorphometry
It involves the development and application of mathematical, statistical and data-analytical methods for modelling and simulation of biological structures. The field is broadly defined and includes foundations in anatomy, applied mathematics and pure mathematics, machine learning, computational mechanics, computational science, biological imaging, neuroscience, physics, probability, and statistics; it also has strong connections with fluid mechanics and geometric mechanics.


Multivariate statistics. Naturalistic observation. Observational techniques. Opinion polling. Organizational learning. Outcome mapping. Outcomes theory. Participant observation. Participatory impact pathways analysis. Policy analysis. Post occupancy evaluation. Process improvement. Project management. Qualitative research. Quality audit. Quality circle. Quality control. Quality management. Quality management system. Quantitative research. Questionnaire. Questionnaire construction. Root cause analysis. Rubrics. Sampling. Self-assessment. Six Sigma. Standardized testing. Statistical process control. Statistical survey. Statistics. Strategic planning. Structured interviewing. Systems theory.

Sampling (statistics)

samplingrandom samplesample
A theoretical formulation for sampling Twitter data has been developed. In manufacturing different types of sensory data such as acoustics, vibration, pressure, current, voltage and controller data are available at short time intervals. To predict down-time it may not be necessary to look at all the data but a sample may be sufficient. Survey results are typically subject to some error. Total errors can be classified into sampling errors and non-sampling errors. The term "error" here includes systematic biases as well as random errors. Sampling errors and biases are induced by the sample design.

Leo Breiman

Breiman, LeoBreiman
A video record of a Leo Breiman's lecture about one of his machine learning techniques. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author).

Linear discriminant analysis

discriminant analysisdiscriminant function analysisFisher's Discriminant Analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification. LDA is closely related to analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements.