# Arithmetic coding

**arithmetic coderarithmetic encodingarithmetic codeArithmeticarithmetic compressionarithmetic encoder**

Arithmetic coding is a form of entropy encoding used in lossless data compression.wikipedia

91 Related Articles

### Lossless compression

**losslesslossless data compressioncompression**

Arithmetic coding is a form of entropy encoding used in lossless data compression.

The primary encoding algorithms used to produce bit sequences are Huffman coding (also used by DEFLATE) and arithmetic coding.

### Asymmetric numeral systems

**Finite State EntropyrANSANS**

Recent family of entropy coders called asymmetric numeral systems allows for faster implementations thanks to directly operating on a single natural number representing the current information.

ANS combines the compression ratio of arithmetic coding (which uses a nearly accurate probability distribution), with a processing cost similar to that of Huffman coding.

### Entropy encoding

**entropy codingentropy codedentropy coder**

Arithmetic coding is a form of entropy encoding used in lossless data compression.

Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding.

### Huffman coding

**HuffmanHuffman codeHuffman encoding**

Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an arbitrary-precision fraction q where 0.0 ≤ q < 1.0.

Other methods such as arithmetic coding often have better compression capability.

### Entropy (information theory)

**entropyinformation entropyShannon entropy**

This 8 bit output is larger than the information content, or entropy of the message, which is

Entropy effectively bounds the performance of the strongest lossless compression possible, which can be realized in theory by using the typical set or in practice using Huffman, Lempel–Ziv or arithmetic coding.

### JPEG

**JPG.jpgJPG/JPEG**

Also, encoders and decoders of the JPEG file format, which has options for both Huffman encoding and arithmetic coding, typically only support the Huffman encoding option, which was originally because of patent concerns; the result is that nearly all JPEG images in use today use Huffman encoding although JPEG's arithmetic coding patents have expired due to the age of the JPEG standard (the design of which was approximately completed by 1990).

The JPEG standard also allows, but does not require, decoders to support the use of arithmetic coding, which is mathematically superior to Huffman coding.

### Bzip2

**BZIPbunzip2BZ2**

At least one significant compression software program, bzip2, deliberately discontinued the use of arithmetic coding in favor of Huffman coding due to the perceived patent situation at the time.

bzip2's ancestor bzip used arithmetic coding instead of Huffman.

### Data compression

**compressionvideo compressioncompressed**

In a further refinement of the direct use of probabilistic modelling, statistical estimates can be coupled to an algorithm called arithmetic coding.

### Image compression

**compressionimagecompressed**

The JPEG image compression format's arithmetic coding algorithm is based on the following cited patents (since expired).

### Context-adaptive binary arithmetic coding

**CABACCABAC entropy coding**

Arithmetic coding is finally applied to compress the data.

### ASCII

**US-ASCIIAmerican Standard Code for Information InterchangeASCII code**

Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code.

### Arbitrary-precision arithmetic

**arbitrary-precisionarbitrary precisionbignum**

Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an arbitrary-precision fraction q where 0.0 ≤ q < 1.0.

### Block code

**linear block codedistanceblock encoding**

Simple block encoding would require 2 bits per symbol, which is wasteful: one of the bit variations is never used.

### Ternary numeral system

**ternarytrit3**

The next step is to encode this ternary number using a fixed-point binary number of sufficient precision to recover it, such as 0.0010110010 2 — this is only 10 bits; 2 bits are saved in comparison with naïve block encoding.

### Shannon's source coding theorem

**source coding theoremShannon's noiseless coding theoremnoiseless coding**

In general, arithmetic coders can produce near-optimal output for any given set of symbols and probabilities (the optimal value is −log 2 P bits for each symbol of probability P, see source coding theorem).

### Conceptual model

**modelmodelsschema**

Compression algorithms that use arithmetic coding start by determining a model of the data – basically a prediction of what patterns will be found in the symbols of the message.

### Adaptive coding

**adaptive**

Models can even be adaptive, so that they continually change their prediction of the data based on what the stream actually contains.

### Prefix code

**prefix-free codeprefix codesprefix-free**

In particular, it is only necessary to transmit enough digits (in whatever base) of the fraction so that all fractions that begin with those digits fall into the final interval; this will guarantee that the resulting code is a prefix code.

### Bit

**bitsbinary digitbinary digits**

This is correct; the information content of a three-digit decimal is bits; the same message could have been encoded in the binary fraction 0.10001010 (equivalent to 0.5390625 decimal) at a cost of only 8 bits.

### Significant figures

**precisionsignificant digitssignificant digit**

In particular, they are written as if the encoder first calculated the fractions representing the endpoints of the interval in full, using infinite precision, and only converted the fraction to its final form at the end of encoding.

### Radix

**basenumber basebases**

In general, arithmetic (and range) coding may be interpreted as a generalized change of radix.

### Numeral system

**numeralsnumeralnumeration**

In a positional numeral system the radix, or base, is numerically equal to a number of different symbols used to express the number.

### Independent and identically distributed random variables

**independent and identically distributedi.i.d.iid**

Because arithmetic coding doesn't compress one datum at a time, it can get arbitrarily close to entropy when compressing iid strings.

### Run-length encoding

**RLErun length encodingrun-length encoded**

An alternative is encoding run lengths via Huffman-based Golomb-Rice codes.