outlier detectionanomaliesdetectingoutliersdetect anomalies
In data mining, anomaly detection (also outlier detection ) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.wikipedia
65 Related Articles
data-miningdataminingknowledge discovery in databases
In data mining, anomaly detection (also outlier detection ) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.
The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining).
Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.
However, it has recently been discussed whether this is adequate for real data, or only on synthetic data sets with a factual ground truth, since classes can contain internal structure, the attributes present may not allow separation of clusters or the classes may contain anomalies.
supervisedsupervised machine learningsupervised classification
In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.
In practice, there are several approaches to alleviate noise in the output values such as early stopping to prevent overfitting as well as detecting and removing the noisy training examples prior to training the supervised learning algorithm.
LOFLocal Outlier Factor (LOF)
In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.
Isolation forest is an unsupervised learning algorithm for anomaly detection that works on the principle of isolating anomalies, instead of the most common techniques of profiling normal points.
ensembles of classifiersensembleBayesian model averaging
By analogy, ensemble techniques have been used also in unsupervised learning scenarios, for example in consensus clustering or in anomaly detection.
k-nearest neighbor algorithmk-nearest neighbork-nearest neighbors
The distance to the kth nearest neighbor can also be seen as a local density estimate and thus is also a popular outlier score in anomaly detection.
intrusion detectionintrusion prevention systemNetwork intrusion detection system
Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting ecosystem disturbances. The counterpart of anomaly detection in intrusion detection is misuse detection. Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.
Her model used statistics for anomaly detection, and resulted in an early IDS at SRI International named the Intrusion Detection Expert System (IDES), which ran on Sun workstations and could consider both user and network level data.
Change point detectionchange-point detection
More generally change detection also includes the detection of anomalous behavior: anomaly detection.
Environment for DeveLoping KDD-Applications Supported by Index-StructuresELKI data mining frameworkELKI framework
* ELKI is an open-source Java data mining toolkit that contains several anomaly detection algorithms, as well as index acceleration for them.
Version 0.1 (July 2008) contained several Algorithms from cluster analysis and anomaly detection, as well as some index structures such as the R*-tree.
The counterpart of anomaly detection in intrusion detection is misuse detection.
It stands against the anomaly detection approach which utilizes the reverse: defining normal system behaviour first and defining all other behaviour as abnormal.
Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text.
outliersstatistical outliersconservative estimate
Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.
Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection).
Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting ecosystem disturbances.
support vector machinesupport vector machinesSVM
LSTMLong Short-term Memory (LSTM)long short term memory
Bayesian networkshierarchical Bayes modelhierarchical Bayesian model
hidden Markov modelsHMMPoisson hidden Markov model
association rulesassociation ruleassociation rule mining
Dorothy DenningDenningDenning, Dorothy E.
Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.
Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning.
University of MunichMunichMunich University