Anomaly detection

outlier detectionanomaliesdetectingoutliersdetect anomalies
In data mining, anomaly detection (also outlier detection ) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.wikipedia
65 Related Articles

Data mining

data-miningdataminingknowledge discovery in databases
In data mining, anomaly detection (also outlier detection ) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.
The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining).

Cluster analysis

clusteringdata clusteringcluster
Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.
However, it has recently been discussed whether this is adequate for real data, or only on synthetic data sets with a factual ground truth, since classes can contain internal structure, the attributes present may not allow separation of clusters or the classes may contain anomalies.

Supervised learning

supervisedsupervised machine learningsupervised classification
In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.
In practice, there are several approaches to alleviate noise in the output values such as early stopping to prevent overfitting as well as detecting and removing the noisy training examples prior to training the supervised learning algorithm.

Local outlier factor

LOFLocal Outlier Factor (LOF)
In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.

Isolation forest

Isolation forest is an unsupervised learning algorithm for anomaly detection that works on the principle of isolating anomalies, instead of the most common techniques of profiling normal points.

Ensemble learning

ensembles of classifiersensembleBayesian model averaging
By analogy, ensemble techniques have been used also in unsupervised learning scenarios, for example in consensus clustering or in anomaly detection.

K-nearest neighbors algorithm

k-nearest neighbor algorithmk-nearest neighbork-nearest neighbors
The distance to the kth nearest neighbor can also be seen as a local density estimate and thus is also a popular outlier score in anomaly detection.

Intrusion detection system

intrusion detectionintrusion prevention systemNetwork intrusion detection system
Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting ecosystem disturbances. The counterpart of anomaly detection in intrusion detection is misuse detection. Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.
Her model used statistics for anomaly detection, and resulted in an early IDS at SRI International named the Intrusion Detection Expert System (IDES), which ran on Sun workstations and could consider both user and network level data.

Change detection

Change point detectionchange-point detection
More generally change detection also includes the detection of anomalous behavior: anomaly detection.

ELKI

Environment for DeveLoping KDD-Applications Supported by Index-StructuresELKI data mining frameworkELKI framework
* ELKI is an open-source Java data mining toolkit that contains several anomaly detection algorithms, as well as index acceleration for them.
Version 0.1 (July 2008) contained several Algorithms from cluster analysis and anomaly detection, as well as some index structures such as the R*-tree.

Misuse detection

The counterpart of anomaly detection in intrusion detection is misuse detection.
It stands against the anomaly detection approach which utilizes the reverse: defining normal system behaviour first and defining all other behaviour as abnormal.

Bank fraud

banking fraudfraudbank
Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text.

Outlier

outliersstatistical outliersconservative estimate
Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.

Statistical classification

classificationclassifierclassifiers
Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection).

Fraud

defraudfraudsterfraudulent
Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting ecosystem disturbances.

Bayesian network

Bayesian networkshierarchical Bayes modelhierarchical Bayesian model

Dorothy E. Denning

Dorothy DenningDenningDenning, Dorothy E.
Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.

Soft computing

soft-computing
Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning.