Information extraction

extractionextraction of informationextract informationextract useful informationextracting informationInformation extraction, or IE,relation extractiontext analysisTopic Extraction
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents.wikipedia
150 Related Articles

Message Understanding Conference

MUCMUC-6 evaluation campaignMUC-7
Beginning in 1987, IE was spurred by a series of Message Understanding Conferences.
the development of new and better methods of information extraction.

Named-entity recognition

named entity recognitionentity extractionnamed entities
Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about.
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Terminology extraction

Term Extractionextracting termsTerminology
Terminology extraction: finding the relevant terms for a given corpus
Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology mining) is a subtask of information extraction.

Relationship extraction

relation extraction
Relationship extraction: identification of relations between entities, such as:
The task is very similar to that of information extraction (IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships.

Natural language processing

NLPnatural languagenatural-language processing
In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP).
Information extraction

Information retrieval

The discipline of information retrieval (IR) has developed automatic methods, typically of a statistical flavor, for indexing large document collections and classifying documents.
Information extraction

General Architecture for Text Engineering

General Architecture for Text Engineering (GATE) is bundled with a free Information Extraction system
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.

Knowledge extraction

knowledge discoveryderivation of knowledgediscovery
Knowledge extraction
Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema.

Mallet (software project)

MalletMachine Learning for Language Toolkit (Mallet)
Machine Learning for Language Toolkit (Mallet) is a Java-based package for a variety of natural language processing tasks, including information extraction.
MALLET is an integrated collection of Java code useful for statistical natural language processing, document classification, cluster analysis, information extraction, topic modeling and other machine learning applications to text.

Applications of artificial intelligence

AIAI applicationsapplication of artificial intelligence
Applications of artificial intelligence
Information Extraction, part of artificial intelligence, is used to extract information from live news feed and to assist with investment decisions.

Text mining

text analyticstext-miningtext
Text analysis involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics.

Maximum-entropy Markov model

maximum entropy Markov modelMaxEnt Markov models
Conditional Markov model (CMM) / Maximum-entropy Markov model (MEMM)
MEMMs find applications in natural language processing, specifically in part-of-speech tagging and information extraction.

Supervised learning

supervisedsupervised classificationsupervised machine learning
Machine learning techniques, either supervised or unsupervised, have been used to induce such rules automatically.
Information extraction

Ontology learning

automatically generated
Ontology extraction
In the subsequent step, similarly to coreference resolution in information extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept.

Outline of artificial intelligence

Artificial intelligenceoutline
Outline of artificial intelligence
Information extraction –


TIPSTER Text program
It supported research to improve informational retrieval and extraction software and worked to deploy these improved technologies to government users.


DBpedia Spotlight is an open source tool in Java/Scala (and free web service) that can be used for named entity recognition and name resolution.
It can also be used for named entity recognition, and other information extraction tasks.

Open information extraction

Open information extraction
In computer science transforming OIE extractions into ontological facts is known as relation extraction.

Enterprise search

enterprise search engineEnterprise Trackenterprise-search
Enterprise search
Information extraction

Faceted search

faceted navigationfaceted browserfaceted browsing
Faceted search
Information extraction

Data extraction

extractionData is extracteddumped
Data extraction
Information extraction

Unstructured data

unstructuredunstructured text(unstructured)
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents.

Machine-readable data

machine-readablemachine readablemachine-readable format
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents.