Named-entity recognition

named entity recognitionentity extractionnamed entitiesNamed Entity Extractionentitiesentityentity detectionentity recognitionnamed entities recognitionNamed Entity Classification
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.wikipedia
103 Related Articles

Information extraction

extractionextraction of informationextract information
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about.

General Architecture for Text Engineering

GATE
GATE supports NER across many languages and domains out of the box, usable via a graphical interface and a Java API.
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.

SpaCy

SpaCy features fast statistical NER as well as an open-source named-entity visualizer.
The library is published under the MIT license and currently offers statistical neural network models for English, German, Spanish, Portuguese, French, Italian, Dutch and multi-language NER, as well as tokenization for various other languages.

Apache OpenNLP

OpenNLP
OpenNLP includes rule-based and statistical named-entity recognition.
It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and coreference resolution.

F1 score

F-MeasureF-scoreF1
For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%.
The F-score has been widely used in the natural language processing literature, such as the evaluation of named entity recognition and word segmentation.

Message Understanding Conference

MUCMUC-6 evaluation campaignMUC-7
For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%.
At the sixth conference (MUC-6) the task of recognition of named entities and coreference was added.

Named entity

named entities
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
There is also a general agreement in the Named Entity Recognition community to consider as named entities temporal and numerical expressions such as amounts of money and other types of units, which may violate the rigid designator perspective.

Shallow parsing

chunkingchunkerchunks
This segmentation problem is formally similar to chunking.
Named entity recognition

Entity linking

Named entity disambiguationcross-linking them to Wikipedianamed entity linking
A recently emerging task of identifying "important expressions" in text and cross-linking them to Wikipedia can be seen as an instance of extremely fine-grained named entity recognition, where the types are the actual Wikipedia pages describing the (potentially ambiguous) concepts.
NED is different from named entity recognition (NER) in that NER identifies the occurrence or mention of a named entity in text but it does not identify which specific entity it is.

Conditional random field

conditional random fieldsCRF
Many different classifier types have been used to perform machine-learned NER, with conditional random fields being a typical choice.
named entity recognition,

Knowledge extraction

knowledge discoveryderivation of knowledgediscovery
Knowledge extraction
# DBpedia Spotlight, OpenCalais, Dandelion dataTXT, the Zemanta API, Extractiv and PoolParty Extractor analyze free text via named-entity recognition and then disambiguates candidates via name resolution and links the found entities to the DBpedia knowledge repository ( Dandelion dataTXT demo or DBpedia Spotlight web demo or PoolParty Extractor Demo).

Natural language processing

NLPnatural languagenatural-language processing
Since about 1998, there has been a great deal of interest in entity identification in the molecular biology, bioinformatics, and medical natural language processing communities.
Named entity recognition (NER): Given a stream of text, determine which items in the text map to proper names, such as people or places, and what the type of each such name is (e.g. person, location, organization). Note that, although capitalization can aid in recognizing named entities in languages such as English, this information cannot aid in determining the type of named entity, and in any case is often inaccurate or insufficient. For example, the first letter of a sentence is also capitalized, and named entities often span several words, only some of which are capitalized. Furthermore, many other languages in non-Western scripts (e.g. Chinese or Arabic) do not have any capitalization at all, and even languages with capitalization may not consistently use it to distinguish names. For example, German capitalizes all nouns, regardless of whether they are names, and French and Spanish do not capitalize names that serve as adjectives.

Onomastics

onomasticonomasticianonomatologist
Onomastics
Onomastics can be helpful in data mining, with applications such as named-entity recognition, or recognition of the origin of names.

Crowdsourcing

crowdsourcedcrowd-sourcedcrowdsource
In recent years, many projects have turned to crowdsourcing, which is a promising solution to obtain high-quality aggregate human judgments for supervised and semi-supervised machine learning approaches to NER.
Crowdsourcing has been extensively used to collect high-quality gold standard for creating automatic systems in natural language processing (e.g. named entity recognition, entity linking).

Controlled vocabulary

controlled vocabulariesControlled-vocabularyvocabulary
Controlled vocabulary
Named-entity recognition

Record linkage

identity resolutionentity resolutionobject identification/entity resolution/record linkage
Record linkage
Named-entity recognition

Smart tag (Microsoft)

smart tagsSmart tagMicrosoft Smart Tags
Smart tag (Microsoft)
Named entity recognition

Unstructured data

unstructuredunstructured text(unstructured)
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Medical classification

statistical classificationmedical codingclassification
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Graphical user interface

GUIgraphicalgraphical interface
GATE supports NER across many languages and domains out of the box, usable via a graphical interface and a Java API.

Java (programming language)

JavaJava programming languageJava language
GATE supports NER across many languages and domains out of the box, usable via a graphical interface and a Java API.

Rigid designator

rigid designatorsrigid designationrigidity
This is closely related to rigid designators, as defined by Kripke, although in practice NER deals with many names and referents that are not philosophically "rigid".

Saul Kripke

KripkeKripke, SaulKripkean
This is closely related to rigid designators, as defined by Kripke, although in practice NER deals with many names and referents that are not philosophically "rigid".

Ford (disambiguation)

Ford
For instance, the automotive company created by Henry Ford in 1903 can be referred to as Ford or Ford Motor Company, although "Ford" can refer to many other entities as well (see Ford).

De dicto and de re

de rede dictode dicto'' and ''de re
Rigid designators include proper names as well as terms for certain biological species and substances, but exclude pronouns (such as "it"; see coreference resolution), descriptions that pick out a referent by its properties (see also De dicto and de re), and names for kinds of things as opposed to individuals (for example "Bank").