### Markov blanket

In statistics and machine learning, the Markov blanket for a node in a graphical model contains all the variables that shield the node from the rest of the network. This means that the Markov blanket of a node is the only knowledge needed to predict the behavior of that node and its children. The term was coined by Judea Pearl in 1988. In a Bayesian network, the values of the parents and children of a node evidently give information about that node. However, its children's parents also have to be included, because they can be used to explain away the node in question. In a Markov random field, the Markov blanket for a node is simply its adjacent (or neighboring) nodes.

### Pattern language (formal languages)

Pattern Languages
In theoretical computer science, a pattern language is a formal language that can be defined as the set of all particular instances of a string of constants and variables. Pattern Languages were introduced by Dana Angluin in the context of machine learning. Given a finite set Σ of constant symbols and a countable set X of variable symbols disjoint from Σ, a pattern is a finite non-empty string of symbols from Σ∪X. The length of a pattern p, denoted by |p|, is just the number of its symbols. The set of all patterns containing exactly n distinct variables (each of which may occur several times) is denoted by P n, the set of all patterns at all by P *.

### Star count

counting the number of starsstar gaugingthick disk
The effects of our point of view in the galaxy, the obscuring clouds of gas and dust in the galaxy, and especially the extreme range of inherent brightness, create a biased view of stars. Knowing that these effects create bias, astronomers analyzing star counts attempt to find how much bias each effect has caused and then compensate for it as well as they can. The greatest problem biasing star counts is the extreme differences in inherent brightness of different sizes. Heavy, bright stars (both giants and blue dwarfs) are the most common stars listed in general star catalogs, even though on average they are rare in space.

### Astroinformatics

There are many research areas involved with astroinformatics, such as data mining, machine learning, statistics, visualization, scientific data management, and semantic science. Data mining and machine learning play significant roles in Astroinformatics as a scientific research discipline due to their focus on "knowledge discovery from data" (KDD) and "learning from data". The amount of data collected from astronomical sky surveys has grown from gigabytes to terabytes throughout the past decade and is predicted to grow in the next decade into hundreds of petabytes with the Large Synoptic Survey Telescope and into the exabytes with the Square Kilometre Array.

### Binomial regression

binary response modelbinomial models
The primary difference is in the theoretical motivation. In machine learning, binomial regression is considered a special case of probabilistic classification, and thus a generalization of binary classification. In one published example of an application of binomial regression, the details were as follows. The observed outcome variable was whether or not a fault occurred in an industrial process. There were two explanatory variables: the first was a simple two-case factor representing whether or not a modified version of the process was used and the second was an ordinary quantitative variable measuring the purity of the material being supplied for the process.

### Matrix calculus

matrix derivativederivativederivative of a scalar with respect to a vector
A single convention can be somewhat standard throughout a single field that commonly uses matrix calculus (e.g. econometrics, statistics, estimation theory and machine learning). However, even within a given field different authors can be found using competing conventions. Authors of both groups often write as though their specific convention were standard. Serious mistakes can result when combining results from different authors without carefully verifying that compatible notations have been used. Definitions of these two conventions and comparisons between them are collected in the layout conventions section.

### Science, technology, engineering, and mathematics

STEMSTEM fieldsSTEM education
STEMLE (Science, Technology, Engineering, Mathematics, Law and Economics); identifies subjects focused on fields such as applied social sciences and anthropology, regulation, cybernetics, machine learning, social systems, computational economics and computational social sciences. MEd Curriculum Studies: STEMS² (Science, Technology, Engineering, Mathematics, Social Sciences and Sense of Place); integrates STEM with social sciences and sense of place. METALS (STEAM + Logic), introduced by Su Su at Teachers College, Columbia University. STREM (Science, Technology, Robotics, Engineering, and Mathematics); adds robotics as a field.

### Similarity measure

measure of similaritysimilarity matrixsimilarity
In machine learning, common kernel functions such as the RBF kernel can be viewed as similarity functions. In spectral clustering, a similarity, or affinity, measure is used to transform data to overcome difficulties related to lack of convexity in the shape of the data distribution. The measure gives rise to an (n, n)-sized similarity matrix for a set of n points, where the entry (i,j) in the matrix can be simply the (negative of the) Euclidean distance between i and j, or it can be a more complex measure of distance such as the Gaussian. Further modifying this result with network analysis techniques is also common. Similarity matrices are used in sequence alignment.

### Content analysis

textual analysistext analysisanalysis
By systematically labeling the content of a set of texts, researchers can analyse patterns of content quantitatively using statistical methods, or use qualitative methods to analyse meanings of content within texts. Computers are increasingly used in content analysis to automate the labeling (or coding) of documents. Simple computational techniques can provide descriptive data such as word frequencies and document lengths. Machine learning classifiers can greatly increase the number of texts that can be labeled, but the scientific utility of doing so is a matter of debate. Content analysis is best understood as a broad family of techniques.

### Grace Wahba

Wahba, GraceWahba
Schoenberg-Hilldale Professor of Statistics at the University of Wisconsin–Madison. She is a pioneer in methods for smoothing noisy data. Best known for the development of generalized cross-validation and "Wahba's problem", she has developed methods with applications in demographic studies, machine learning, DNA microarrays, risk modeling, medical imaging, and climate prediction. She was educated at Cornell (B.A. 1956), University of Maryland, College Park (M.A. 1962) and Stanford (Ph.D. 1966), and worked in industry for several years before receiving her doctorate in 1966 and settling in Madison in 1967. She is the author of Spline Models for Observational Data.

### Recursive Bayesian estimation

Bayes filterBayesian filteringBayesian filtering and control setting
In Probability Theory, Statistics, and Machine Learning: Recursive Bayesian Estimation, also known as a Bayes Filter, is a general probabilistic approach for estimating an unknown probability density function (PDF) recursively over time using incoming measurements and a mathematical process model. The process relies heavily upon mathematical concepts and models that are theorized within a study of prior and posterior probabilities known as Bayesian Statistics. A Bayes filter is an algorithm used in computer science for calculating the probabilities of multiple beliefs to allow a robot to infer its position and orientation.

While the basic idea behind stochastic approximation can be traced back to the Robbins–Monro algorithm of the 1950s, stochastic gradient descent has become an important optimization method in machine learning. Both statistical estimation and machine learning consider the problem of minimizing an objective function that has the form of a sum: : where the parameter w that minimizes Q(w) is to be estimated. Each summand function Q_i is typically associated with the i-th observation in the data set (used for training). In classical statistics, sum-minimization problems arise in least squares and in maximum-likelihood estimation (for independent observations).

### Mean absolute percentage error

MAPEMean Absolute Percent Error
The mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation, also used as a loss function for regression problems in machine learning. It usually expresses accuracy as a percentage, and is defined by the formula: where A t is the actual value and F t is the forecast value. The difference between A t and F t is divided by the actual value A t again. The absolute value in this calculation is summed for every forecasted point in time and divided by the number of fitted points n. Multiplying by 100% makes it a percentage error.

### Metascience

Meta-researchEvidence-based researchevidence-based
"The use of p values for nearly a century [since 1925] to determine statistical significance of experimental results has contributed to an illusion of certainty and [to] reproducibility crises in many scientific fields. There is growing determination to reform statistical analysis... Some [researchers] suggest changing statistical methods, whereas others would do away with a threshold for defining "significant" results." (p. 63.). Minerva: A Journal of Science, Learning and Policy. '' Research Integrity and Peer Review. Research Policy. Science and Public Policy.

### Latent variable

latent variableslatenthidden variables
In statistics, latent variables (from Latin: present participle of lateo (“lie hidden”), as opposed to observable variables) are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed (directly measured). Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable models. Latent variable models are used in many disciplines, including psychology, demography, economics, engineering, medicine, physics, machine learning/artificial intelligence, bioinformatics, chemometrics, natural language processing, econometrics, management and the social sciences.

### Consensus forecast

Consensus forecastscombining forecastseconomic forecasts
Also known as combining forecasts, forecast averaging or model averaging (in econometrics and statistics) and committee machines, ensemble averaging or expert aggregation (in machine learning). Applications can range from forecasting the weather to predicting the annual Gross Domestic Product of a country or the number of cars a company or an individual dealer is likely to sell in a year. While forecasts are often made for future values of a time series, they can also be for one-off events such as the outcome of a presidential election or a football match. Forecasting plays a key role in any organisation's planning process as it provides insight into uncertainty.

analyticalBA
Predictive analytics: employs predictive modelling using statistical and machine learning techniques. Prescriptive analytics: recommends decisions using optimization, simulation, etc. Behavioral analytics. Cohort analysis. Collections analytics. Contextual data modeling - supports the human reasoning that occurs after viewing "executive dashboards" or any other visual analytics. Cyber analytics. Enterprise optimization. Financial services analytics. Fraud analytics. Health care analytics. Marketing analytics. Pricing analytics. Retail sales analytics. Risk & Credit analytics. Supply chain analytics. Talent analytics. Telecommunications. Transportation analytics. Customer Journey Analytics.

Neal is a professor at the Department of Statistics and Department of Computer Science at the University of Toronto, where he holds a research chair in statistics and machine learning. He studied computer science at the University of Calgary (B.Sc. 1977, M.Sc. 1980) and at the University of Toronto (Ph.D. 1995). He has made great contributions in the area of machine learning and statistics, where he is particular well known for his work on Markov chain Monte Carlo, error correcting codes and Bayesian learning for neural networks. He is also known for his blog and as the developer of pqR: a new version of the R interpreter.

### Fuzzy concept

fuzzysquishy
Cybernetics research, artificial intelligence, virtual intelligence, machine learning, database design and soft computing research. "Fuzzy risk scores" are used by project managers and portfolio managers to express financial risk assessments. Fuzzy logic has been applied to the problem of predicting cement strength. The main international body is the International Fuzzy Systems Association (IFSA). The Computational Intelligence Society of the Institute of Electrical and Electronics Engineers, Inc. (IEEE) has an international membership and deals with fuzzy logic, neural networks and evolutionary computing.

### Information engineering (field)

information engineeringInformationIE/Information engineering
Machine learning is the field that involves the use of statistical and probabilistic methods to let computers "learn" from data without being explicitly programmed. Data science involves the application of machine learning to extract knowledge from data. Subfields of machine learning include deep learning, supervised learning, unsupervised learning, reinforcement learning, semi-supervised learning, and active learning. Causal inference is another related component of information engineering. Control theory refers to the control of (continuous) dynamical systems, with the aim being to avoid delays, overshoots, or instability.

### Ordinal regression

In machine learning, alternatives to the latent-variable models of ordinal regression have been proposed. An early result was PRank, a variant of the perceptron algorithm that found multiple parallel hyperplanes separating the various ranks; its output is a weight vector w and a sorted vector of K−1 thresholds θ, as in the ordered logit/probit models. The prediction rule for this model is to output the smallest rank k such that wx < θ k . Other methods rely on the principle of large-margin learning that also underlies support vector machines.

### Information visualization

visualizationinformation visualisationgraphical representation
.), statistics (hypothesis test, regression, PCA, etc.), data mining (association mining, etc.), and machine learning methods (clustering, classification, decision trees, etc.). Among these approaches, information visualization, or visual data analysis, is the most reliant on the cognitive skills of human analysts, and allows the discovery of unstructured actionable insights that are limited only by human imagination and creativity. The analyst does not have to learn any sophisticated methods to be able to interpret the visualizations of the data.

### Computational psychometrics

Pursuing a computational approach to psychometrics often involves scientists working in multidisciplinary teams with expertise in artificial intelligence, machine learning, deep learning and neural network modeling, natural language processing, mathematics and statistics, developmental and cognitive psychology, computer science, data science, learning sciences, virtual and augmented reality, and traditional psychometrics. Computational psychometrics incorporates both theoretical and applied components ranging from item response theory, classical test theory, and Bayesian approaches to modeling knowledge acquisition and discovery of network psychometric models.

### Stochastic block model

The stochastic block model is important in statistics, machine learning, and network science, where it serves as a useful benchmark for the task of recovering community structure in graph data. The stochastic block model takes the following parameters: The edge set is then sampled at random as follows: any two vertices u \in C_i and v \in C_j are connected by an edge with probability P_{ij}. An example problem is: given a graph with n vertices, where the edges are sampled as described, recover the groups. If the probability matrix is a constant, in the sense that P_{ij} = p for all i,j, then the result is the Erdős–Rényi model G(n,p).