Speech recognition

voice recognitionautomatic speech recognitionvoice commandspeech-to-textvoice commandsspeechVoice dialingspoken commandsvoice recognition softwarerecognition
Speech recognition is the inter-disciplinary sub-field of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers.wikipedia
791 Related Articles

Speaker recognition

voice recognitionspeaker verificationvoice-activated
The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying.
The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).

Deep learning

deep neural networksdeep neural networkdeep-learning
Most recently, the field has benefited from advances in deep learning and big data. Today, however, many aspects of speech recognition have been taken over by a deep learning method called Long short-term memory (LSTM), a recurrent neural network published by Sepp Hochreiter & Jürgen Schmidhuber in 1997.
Deep learning architectures such as deep neural networks, deep belief networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases superior to human experts.

SoundHound

These speech industry players include Google, Microsoft, IBM, Baidu, Apple, Amazon, Nuance, GoVivace Inc., SoundHound, iFLYTEK many of which have publicized the core technology in their speech recognition systems as being based on deep learning.
SoundHound Inc., founded in 2005, is an audio and speech recognition company.

Frederick Jelinek

Fred JelinekJelinek
* By the mid-1980s IBM's Fred Jelinek's team created a voice activated typewriter called Tangora, which could handle a 20,000-word vocabulary Jelinek's statistical approach put less emphasis on emulating the way the human brain processes and understands speech in favor of using statistical modeling techniques like HMMs.
Frederick Jelinek (18 November 1932 – 14 September 2010) was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing.

Windows Speech Recognition

Speech Recognitionspeech recognition group at MicrosoftWindows Speech Recognition Macros
Huang went on to found the speech recognition group at Microsoft in 1993.
Windows Speech Recognition (WSR) is a speech recognition component developed by Microsoft for the Windows Vista operating system that enables the use of voice commands to control the desktop user interface; dictate text in electronic documents and email; navigate websites; perform keyboard shortcuts; and to operate the mouse cursor.

CMU Sphinx

SphinxSphinx-IIPocket Sphinx
Raj Reddy's former student, Xuedong Huang, developed the Sphinx-II system at CMU.
CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University.

Nuance Communications

NuanceScanSoftCaere Corporation
These speech industry players include Google, Microsoft, IBM, Baidu, Apple, Amazon, Nuance, GoVivace Inc., SoundHound, iFLYTEK many of which have publicized the core technology in their speech recognition systems as being based on deep learning. Leading software vendors in this field are: Google, Microsoft Corporation (Microsoft Voice Command), Digital Syphon (Sonic Extractor), GoVivace Inc., LumenVox, Nuance Communications (Nuance Voice Control), Voci Technologies, VoiceBox Technology, Speech Technology Center, Vito Technologies (VITO Voice2Go), Speereo Software (Speereo Voice Translator), Verbyx VRX and SVOX.
This permitted the use of the system, so-called speaker-independent natural-language speech recognition (abbreviated as SI-NLSR or just NLSR), for call automation.

Lernout & Hauspie

L&HLernout & Hauspie Speech Product
Lernout & Hauspie, a Belgium-based speech recognition company, acquired several other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000.
Lernout & Hauspie Speech Products, or L&H, was a leading Belgium-based speech recognition technology company, founded by Jo Lernout and Pol Hauspie, that went bankrupt in 2001 because of a fraud engineered by management.

Kai-Fu Lee

Lee Kaifu
Raj Reddy's student Kai-Fu Lee joined Apple where, in 1992, he helped develop a speech interface prototype for the Apple computer known as Casper.
Lee developed the world's first speaker-independent, continuous speech recognition system as his Ph.D. thesis at Carnegie Mellon.

Recurrent neural network

recurrent neural networksrecurrentElman networks
Today, however, many aspects of speech recognition have been taken over by a deep learning method called Long short-term memory (LSTM), a recurrent neural network published by Sepp Hochreiter & Jürgen Schmidhuber in 1997.
This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

Dynamic time warping

dynamic time warping (DTW)time warping
Around this time Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary.
A well known application has been automatic speech recognition, to cope with different speaking speeds.

IFlytek

These speech industry players include Google, Microsoft, IBM, Baidu, Apple, Amazon, Nuance, GoVivace Inc., SoundHound, iFLYTEK many of which have publicized the core technology in their speech recognition systems as being based on deep learning.
It creates voice recognition software and 10+ voice-based internet/mobile products covering education, communication, music, intelligent toys industries.

GOOG-411

The first product was GOOG-411, a telephone based directory service.
GOOG-411 (or Google Voice Local Search) was a telephone service launched by Google in 2007, that provided a speech-recognition-based business directory search, and placed a call to the resulting number in the United States or Canada.

Acoustic model

acoustic modeling
The use of deep feedforward (non-recurrent) networks for acoustic modeling was introduced during later part of 2009 by Geoffrey Hinton and his students at University of Toronto and by Li Deng and colleagues at Microsoft Research, initially in the collaborative work between Microsoft and University of Toronto which was subsequently expanded to include IBM and Google (hence "The shared views of four research groups" subtitle in their 2012 review paper). Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms.
An acoustic model is used in automatic speech recognition to represent the relationship between an audio signal and the phonemes or other linguistic units that make up speech.

Lawrence Rabiner

L. RabinerLarry RabinerLawrence R. Rabiner
1990 - Dragon Dictate, a consumer product released in 1990 AT&T deployed the Voice Recognition Call Processing service in 1992 to route telephone calls without the use of a human operator. The technology was developed by Lawrence Rabiner and others at Bell Labs.
Lawrence R. Rabiner (born 28 September 1943) is an electrical engineer working in the fields of digital signal processing and speech processing; in particular in digital signal processing for automatic speech recognition.

Language model

language modelingstatistical language modelslanguage
Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms.
Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, Optical Character Recognition, handwriting recognition, information retrieval and other applications.

DARPA Global autonomous language exploitation program

DARPA GALE programGALEGlobal Autonomous Language Exploitation
In the 2000s DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002 and Global Autonomous Language Exploitation (GALE).
The program encompassed three main challenges: automatic speech recognition, machine translation, and information retrieval.

Hidden Markov model

hidden Markov modelsHMMhidden Markov models (HMMs)
A decade later, at CMU, Raj Reddy's students James Baker and Janet M. Baker began using the Hidden Markov Model (HMM) for speech recognition. In the early 2000s, speech recognition was still dominated by traditional approaches such as Hidden Markov Models combined with feedforward artificial neural networks.
Hidden Markov models are especially known for their application in reinforcement learning and temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

Artificial neural network

artificial neural networksneural networksneural network
In the early 2000s, speech recognition was still dominated by traditional approaches such as Hidden Markov Models combined with feedforward artificial neural networks.
Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.

Google Voice

Grand Central
In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM, which is now available through Google Voice to all smartphone users.
Many other Google Voice services—such as voicemail, free text messaging, call history, conference calling, call screening, blocking of unwanted calls, and voice transcription to text of voicemail messages—are also available to.

Keyword spotting

In the United States, the National Security Agency has made use of a type of speech recognition for keyword spotting since at least 2006.
Since speech recognition technology forms the core of keyword spotting, the solution can also be used to build content based indexes of audio archives for intelligence and business applications.

Babel program

Babel
Some government research programs focused on intelligence applications of speech recognition, e.g. DARPA's EARS's program and IARPA's Babel program.
The IARPA Babel program developed speech recognition technology for noisy telephone conversations.

RIPAC (microprocessor)

RIPAC
RIPAC was aimed to provide efficient real-time speech recognition services to the italian telephone system provided by SIP.

Microphone

microphonescondenser microphonedynamic microphone
The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot, in general, does not wear a facemask, which would reduce acoustic noise in the microphone.
Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and public events, motion picture production, live and recorded audio engineering, sound recording, two-way radios, megaphones, radio and television broadcasting, and in computers for recording voice, speech recognition, VoIP, and for non-acoustic purposes such as ultrasonic sensors or knock sensors.

LumenVox

LumenVox Speech Engine
Leading software vendors in this field are: Google, Microsoft Corporation (Microsoft Voice Command), Digital Syphon (Sonic Extractor), GoVivace Inc., LumenVox, Nuance Communications (Nuance Voice Control), Voci Technologies, VoiceBox Technology, Speech Technology Center, Vito Technologies (VITO Voice2Go), Speereo Software (Speereo Voice Translator), Verbyx VRX and SVOX.
LumenVox is a privately held speech recognition software company, based in San Diego, California.