Speech recognitionwikipedia
Speech recognition is the inter-disciplinary sub-field of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers.
speech recognitionvoice recognitionautomatic speech recognitionvoice commandspeech-to-textspeechvoice commandsvoice dialingspoken commandsvoice recognition software

Speaker recognition

speaker recognitionvoice recognitionvoice-activated
The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying.
There is a difference between speaker recognition (recognizing who is speaking) and speech recognition (recognizing what is being said).

Deep learning

deep learningdeep neural networksdeep neural network
Most recently, the field has benefited from advances in deep learning and big data.
Deep learning architectures such as deep neural networks, deep belief networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design and board game programs, where they have produced results comparable to and in some cases superior to human experts.

SoundHound

These speech industry players include Google, Microsoft, IBM, Baidu, Apple, Amazon, Nuance, SoundHound, iFLYTEK many of which have publicized the core technology in their speech recognition systems as being based on deep learning.
SoundHound Inc., founded in 2005, is an audio and speech recognition company.

Frederick Jelinek

Fred JelinekJelinek
Under Fred Jelinek's lead, IBM created a voice activated typewriter called Tangora, which could handle a 20,000 word vocabulary by the mid 1980s.
Frederick Jelinek (18 November 1932 – 14 September 2010) was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing.

Windows Speech Recognition

speech recognitionspeech recognition group at MicrosoftWindows Speech Recognition Macros
Huang went on to found the speech recognition group at Microsoft in 1993.
Windows Speech Recognition (WSR) is a speech recognition component developed by Microsoft for Windows Vista that enables the use of voice commands to control the desktop user interface; dictate text in electronic documents, forms and email; navigate websites; perform keyboard shortcuts; operate the mouse cursor; and create macros to perform additional tasks.

CMU Sphinx

Sphinx-IISphinxSphinx4 Speech Recognizer
Raj Reddy's former student, Xuedong Huang, developed the Sphinx-II system at CMU.
CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University.

Lernout & Hauspie

L&HLernout & Hauspie Speech Product
Lernout & Hauspie, a Belgium-based speech recognition company, acquired several other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000.
Lernout & Hauspie Speech Products, or L&H, was a leading Belgium-based speech recognition technology company, founded by Jo Lernout and Pol Hauspie, that went bankrupt in 2001 because of a fraud engineered by management.

Kai-Fu Lee

Kai-Fu LeeLee Kaifu
Raj Reddy's student Kai-Fu Lee joined Apple where, in 1992, he helped develop a speech interface prototype for the Apple computer known as Casper.
Lee developed the world's first speaker-independent, continuous speech recognition system as his Ph.D. thesis at Carnegie Mellon.

Nuance Communications

NuanceScanSofteCopy PaperWorks
These speech industry players include Google, Microsoft, IBM, Baidu, Apple, Amazon, Nuance, SoundHound, iFLYTEK many of which have publicized the core technology in their speech recognition systems as being based on deep learning. Leading software vendors in this field are: Google, Microsoft Corporation (Microsoft Voice Command), Digital Syphon (Sonic Extractor), LumenVox, Nuance Communications (Nuance Voice Control), Voci Technologies, VoiceBox Technology, Speech Technology Center, Vito Technologies (VITO Voice2Go), Speereo Software (Speereo Voice Translator), Verbyx VRX and SVOX.
This permitted the use of the system, so-called speaker-independent natural-language speech recognition (abbreviated as SI-NLSR or just NLSR), for call automation.

Hidden Markov model

hidden Markov modelhidden Markov modelsHMM
A decade later, at CMU, Raj Reddy's students James Baker and Janet M. Baker began using the Hidden Markov Model (HMM) for speech recognition. In the early 2000s, speech recognition was still dominated by traditional approaches such as Hidden Markov Models combined with feedforward artificial neural networks.
Hidden Markov models are especially known for their application in reinforcement learning and temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

Artificial neural network

artificial neural networkartificial neural networksneural networks
In the early 2000s, speech recognition was still dominated by traditional approaches such as Hidden Markov Models combined with feedforward artificial neural networks.
Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.

Recurrent neural network

recurrent neural networkrecurrent neural networksrecurrent
a recurrent neural network published by Sepp Hochreiter & Jürgen Schmidhuber in 1997.
This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

Dynamic time warping

dynamic time warpingtime warpingdynamic time warping (DTW)
Also around this time Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary.
A well known application has been automatic speech recognition, to cope with different speaking speeds.

Siri

SiriApple SiriApple's Siri
Apple originally licensed software from Nuance to provide speech recognition capability to its digital assistant Siri.
Its speech recognition engine was provided by Nuance Communications, and Siri uses advanced machine learning technologies to function.

IFlytek

iFlytek
These speech industry players include Google, Microsoft, IBM, Baidu, Apple, Amazon, Nuance, SoundHound, iFLYTEK many of which have publicized the core technology in their speech recognition systems as being based on deep learning.
It creates voice recognition software and 10+ voice-based internet/mobile products covering education, communication, music, intelligent toys industries.

GOOG-411

The first product was GOOG-411, a telephone based directory service.
GOOG-411 (or Google Voice Local Search) was a telephone service launched by Google in 2007, that provided a speech-recognition-based business directory search, and placed a call to the resulting number in the United States or Canada.

Acoustic model

acoustic modelacoustic modeling
The use of deep feedforward (non-recurrent) networks for acoustic modeling was introduced during later part of 2009 by Geoffrey Hinton and his students at University of Toronto and by Li Deng and colleagues at Microsoft Research, initially in the collaborative work between Microsoft and University of Toronto which was subsequently expanded to include IBM and Google (hence "The shared views of four research groups" subtitle in their 2012 review paper). Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms.
An acoustic model is used in automatic speech recognition to represent the relationship between an audio signal and the phonemes or other linguistic units that make up speech.

Lawrence Rabiner

L. RabinerLarry RabinerLawrence R. Rabiner
The technology was developed by Lawrence Rabiner and others at Bell Labs.
Lawrence R. Rabiner (born 28 September 1943) is an electrical engineer working in the fields of digital signal processing and speech processing; in particular in digital signal processing for automatic speech recognition.

Language model

language modellanguage modelingstatistical language models
Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms.
Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, Optical Character Recognition, handwriting recognition, information retrieval and other applications.

DARPA Global autonomous language exploitation program

GALEGlobal Autonomous Language ExploitationDARPA GALE program
In the 2000s DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002 and Global Autonomous Language Exploitation (GALE).
The program encompassed three main challenges: automatic speech recognition, machine translation, and information retrieval.

Google Voice

Google VoiceGrand Central
In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM, which is now available through Google Voice to all smartphone users.
Many other Google Voice services—such as voicemail, free text messaging, call history, conference calling, call screening, blocking of unwanted calls, and voice transcription to text of voicemail messages—are also available to.

Babel program

BabelBabel Program
Some government research programs focused on intelligence applications of speech recognition, e.g. DARPA's EARS's program and IARPA's Babel program.
The IARPA Babel program developed speech recognition technology for noisy telephone conversations.

RIPAC (microprocessor)

RIPAC
In the mid-Eighties new speech recognition microprocessors were released: for example RIPAC, an independent-speaker recognition (for continuous speech) chip tailored for telephone services, was presented in the Netherlands in 1986.
RIPAC was aimed to provide efficient real-time speech recognition services to the italian telephone system provided by SIP.

Microphone

microphonemicrophonescondenser microphone
The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot, in general, does not wear a facemask, which would reduce acoustic noise in the microphone.
Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and public events, motion picture production, live and recorded audio engineering, sound recording, two-way radios, megaphones, radio and television broadcasting, and in computers for recording voice, speech recognition, VoIP, and for non-acoustic purposes such as ultrasonic sensors or knock sensors.

LumenVox

LumenVox Speech Engine
Leading software vendors in this field are: Google, Microsoft Corporation (Microsoft Voice Command), Digital Syphon (Sonic Extractor), LumenVox, Nuance Communications (Nuance Voice Control), Voci Technologies, VoiceBox Technology, Speech Technology Center, Vito Technologies (VITO Voice2Go), Speereo Software (Speereo Voice Translator), Verbyx VRX and SVOX.
LumenVox is a privately held speech recognition software company, based in San Diego, California.