General Architecture for Text Engineering

GATE
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.wikipedia
70 Related Articles

Named-entity recognition

named entity recognitionentity extractionnamed entities
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.
GATE supports NER across many languages and domains out of the box, usable via a graphical interface and a Java API.

LIBSVM

Plugins are included for machine learning with Weka, RASP, MAXENT, SVM Light, as well as a LIBSVM integration and an in-house perceptron implementation, for managing ontologies like WordNet, for querying search engines like Google or Yahoo, for part of speech tagging with Brill or TreeTagger, and many more.
The SVM learning code from both libraries is often reused in other open source machine learning toolkits, including GATE, KNIME, Orange and scikit-learn.

JAPE (linguistics)

JAPE
JAPE transducers are used within GATE to manipulate annotations on text.
In computational linguistics, JAPE is the Java Annotation Patterns Engine, a component of the open-source General Architecture for Text Engineering (GATE) platform.

Pheme (project)

Pheme
Pheme, a major EU project managed by the GATE group on early detection of false information in social media
The project is a partnership between the University of Sheffield as part of GATE, the University of Warwick, King's College London, Saarland University in Germany and MODUL University Vienna.

Information extraction

extractionextraction of informationextract information
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.
General Architecture for Text Engineering (GATE) is bundled with a free Information Extraction system

UIMA

Apache UimaUIMA Unstructured Information Management Architecture frameworkUnstructured Information Management Architecture
Unstructured Information Management Architecture (UIMA)
General Architecture for Text Engineering (GATE)

Java (programming language)

JavaJava programming languageJava language
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.

University of Sheffield

SheffieldSheffield UniversityThe University of Sheffield
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.

Natural language processing

NLPnatural languagenatural-language processing
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.

R (programming language)

RR programming languageCRAN
GATE has been compared to NLTK, R and RapidMiner.

RapidMiner

GATE has been compared to NLTK, R and RapidMiner.

James Burke (science historian)

James BurkeBurke, JamesBurke
GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and KnowledgeWeb, as well as many other projects.

SourceForge

sourceforge.netJanuary 2011 attacks on SourceForge.netsf.net
As of May 28, 2011, 881 people are on the gate-users mailing list at SourceForge.net, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005.

Lexical analysis

tokenlexical analyzertokens
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.

Gazetteer

1 gazetteersgeographical dictionary
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.

Sentence boundary disambiguation

sentence segmentationsentence splitter
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.

Part-of-speech tagging

part of speech taggerpart-of-speechpart-of-speech tag
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.

Coreference

Coreference resolutioncoreferentialco-referential
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.

English language

EnglishEnglish-languageen
Languages currently handled in GATE include English, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian, Danish.

Standard Chinese

MandarinChineseMandarin Chinese
Languages currently handled in GATE include English, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian, Danish.

Arabic

Arabic-languageArabArabic language
Languages currently handled in GATE include English, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian, Danish.

Bulgarian language

BulgarianBulgarian:Bulgarian Cyrillic
Languages currently handled in GATE include English, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian, Danish.

French language

FrenchfrancophoneFrench-language
Languages currently handled in GATE include English, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian, Danish.