Speech coding

speech encodingspeech codecSpeechvoice codecspeech coderspeech compressionvoice compressionAnalysis by Synthesisspeech codecsAnalysis-by-Synthesis (AbS)
Speech coding is an application of data compression of digital audio signals containing speech.wikipedia
147 Related Articles

Vocoder

vocodedvocodersvocoding
A vocoder (, a portmanteau of voice and encoder) is a category of voice codec that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

Linear predictive coding

LPClinear prediction coefficientsBlock Independent LPC
The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as line spectral pairs (LSPs).
LPC is the most widely used method in speech coding and speech synthesis.

Digital audio

digital musicdigitalaudio
Speech coding is an application of data compression of digital audio signals containing speech.
Perceptual coding was first used for speech coding compression, with linear predictive coding (LPC).

Modified discrete cosine transform

MDCTModulated Lapped Transformtime-domain aliasing cancellation
The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques. The modified discrete cosine transform (MDCT), a type of discrete cosine transform (DCT) algorithm, was adapted into a speech coding algorithm called LD-MDCT, used for the AAC-LD format introduced in 1999.
It is employed in most modern audio coding standards, including MP3, Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC, Cook, Advanced Audio Coding (AAC), LDAC, Dolby AC-4, MPEG-H 3D Audio, as well as speech coding standards such as AAC-LD (LD-MDCT), G.722.1, G.729.1, CELT, and Opus.

Audio coding format

audio codingaudio coding standardaudio compression format
The techniques employed in speech coding are similar to those used in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system.
Perceptual coding was first used for speech coding compression, with linear predictive coding (LPC).

Code-excited linear prediction

CELPCode Excited Linear PredictionCode-excited linear prediction (CELP)
In particular, the most common speech coding scheme is the LPC-based Code Excited Linear Prediction (CELP) coding, which is used for example in the GSM standard.
Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985.

Line spectral pairs

Line spectral frequenciesLine spectral pairLine Spectrum Pair
In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as line spectral pairs (LSPs).
For this reason, LSPs are very useful in speech coding.

Voice over IP

VoIPVoice over Internet Protocolvoice-over-IP
The two most important applications of speech coding are mobile telephony and voice over IP (VoIP).
Various codecs exist that optimize the media stream based on application requirements and network bandwidth; some implementations rely on narrowband and compressed speech, while others support high-fidelity stereo codecs.

Discrete cosine transform

DCTiDCTinverse discrete cosine transform
The modified discrete cosine transform (MDCT), a type of discrete cosine transform (DCT) algorithm, was adapted into a speech coding algorithm called LD-MDCT, used for the AAC-LD format introduced in 1999.
It is used in most digital media, including digital images (such as JPEG and HEIF, where small high-frequency components can be discarded), digital video (such as MPEG, H.26x and Vorbis), digital audio (such as Dolby Digital, MP3 and AAC), digital television (such as SDTV, HDTV and VOD), digital radio (such as AAC+ and DAB+), and speech coding (such as AAC-LD, Siren and Opus).

G.711

G711G.711.1ITU G.711
From this point of view, the A-law and μ-law algorithms (G.711) used in traditional PCM digital telephony can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution.

Opus (audio format)

OpusOpus audio format.opus
Opus is a free software speech coder, unencumbered by patent restrictions.
Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors.

Codec 2

Codec2
Codec2 is another free software speech coder, unencumbered by patent restrictions, which manages to achieve very good compression, as low as 700 bit/s.
Codec 2 is a low-bitrate speech audio codec (speech coding) that is patent free and open source.

Telephony

digital telephonytelephonedigital
From this point of view, the A-law and μ-law algorithms (G.711) used in traditional PCM digital telephony can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution.
A solution to this issue was linear predictive coding (LPC), a speech coding data compression algorithm that was first proposed by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone (NTT) in 1966.

Secure voice

voice encryptionCiphonysecure telephone
Much of the later works in speech compression was motivated by military research into digital communications for secure military radios, where very low data rates were required to allow effective operation in a hostile radio environment.
Secure voice's robustness greatly benefits from having the voice data compressed into very low bit-rates by special component called speech coding, voice compression or voice coder (also known as vocoder).

Adaptive Multi-Rate Wideband

AMR-WBG.722.2WB
Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using similar methodology as algebraic code excited linear prediction (ACELP).

Speex

SPX.spxlibspeex
Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on VoIP applications and podcasts.

Data compression

compressionvideo compressioncompressed
Speech coding is an application of data compression of digital audio signals containing speech. The techniques employed in speech coding are similar to those used in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system.
Compression of human speech is often performed with even more specialized techniques; speech coding is distinguished as a separate discipline from general-purpose audio compression.

UMTS

WCDMAW-CDMATD-SCDMA
UMTS combines three different terrestrial air interfaces, GSM's Mobile Application Part (MAP) core, and the GSM family of speech codecs.

Adaptive differential pulse-code modulation

ADPCMAdaptive DPCMAdaptive Differential Pulse Code Modulation
ADPCM was developed for speech coding by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973.

Full Rate

GSMGSM 06.10GSM Full Rate
Full Rate (FR or GSM-FR or GSM 06.10 or sometimes simply GSM) was the first digital speech coding standard used in the GSM digital mobile phone system.

Linear prediction

linearlySignal predictioncoefficient
In CELP, the modelling is divided in two stages, a linear predictive stage that models the spectral envelope and code-book based model of the residual of the linear predictive model.
In fact, the autocorrelation method is the most common and it is used, for example, for speech coding in the GSM standard.

Adaptive Multi-Rate audio codec

AMRAMR-NBAdaptive Multi-Rate
The Adaptive Multi-Rate (AMR, AMR-NB or GSM-AMR) audio codec is an audio compression format optimized for speech coding.

Enhanced full rate

EFRGSM-EFREnhanced Full-Rate (EFR)
Enhanced Full Rate or EFR or GSM-EFR or GSM 06.60 is a speech coding standard that was developed in order to improve the quite poor quality of GSM-Full Rate (FR) codec.

Selectable Mode Vocoder

SMV
Selectable Mode Vocoder (SMV) is variable bitrate speech coding standard used in CDMA2000 networks.

Half Rate

GSM-HRhalf rate channel (HR)Half-Rate (HR)
Half Rate (HR or GSM-HR or GSM 06.20) is a speech coding system for GSM, developed in the early 1990s.