JPEG Huffman Coding Tutorial

Why I wrote this tutorial

In attempting to understand the inner workings of JPEG compression, I was unable to find any real details on the net for how Huffman coding is used in the context of JPEG image compression. There are many sites that describe the generic huffman coding scheme, but none that describe how it will appear in a JPEG image, after factoring in the DHT tables, interleaved chroma subsampling components, etc. While it is relatively easy to understand the JPEG marker extraction, the Start of Scan data segment is the least understood and most important part of the image data. Therefore, I decided to create a page to walk through a decompression example. Hopefully others will find this useful!

The relevant sections in the JPEG Standard are quite obscure -- enough so that I set out to analyze several JPEG images to reverse-engineer how the huffman coding was being applied in a JPEG JFIF file.

Latest Update:

[09/22/2009]: Corrected Table 5 (added entry for DC 00 code).
[09/19/2008]: Corrected Table 1 (added entry for codes of length 9 bits).
[12/03/2007]: Corrected typo in text near Table 5 (code 00101). Added JPEGsnoop output (at end of Tutorial).
[01/27/2007]: Added section describing how to expand DHT into bit strings.

The Goal

The goal of this tutorial is to take a simple JPEG image and try to decode the compressed image data by hand, learning how the Huffman compression scheme works in the process.

Simplest JPEG Example

Most digital photos are full-color natural/organic images, which means that all three image components (one luminance and two color channels) will all have both low and high-frequency content. In addition, nearly all digital photos use chroma subsampling, which makes the extraction process a little more complicated. For the purposes of showing the basic huffman extraction, we will start with the simplest of all JPEG images:

Grayscale - no content in the two color channels
Solid color in each MCU - By making all pixels in an 8x8 block the same color, there will be no AC components.
No chroma subsampling - Makes scan data extraction simpler: Y, Cb, Cr, Y, Cb, Cr, etc.
Small Image - Total image size is 16x8 = two MCUs or blocks. This makes the extraction in this tutorial shorter.

Creating the Image

For the purposes of this tutorial, my working image will simply be a 16x8 pixel image, with two solid color blocks: one black and the other white. Note that each block is 8x8 pixels in size. The actual image is here: . If you want to download it, right-click and select Save Picture As...

Creating the sample image was trivial, working at 1600% view. Important that dimensions and any changes in the content are on 8-pixel boundaries. Overall image dimensions should be a multiple of 8 pixels as well, in both directions. The image below is a magnified version with a grid overlayed.

Once the image was created, it was saved with Photoshop CS2's Save for Web... command. This kept the file size down as it discards other extraneous file information (metadata, etc.) that is not relevant to this tutorial. Some other important points:

Use Save for Web - Reduces total file content to minimal subset.
Use Quality level 51+ - This ensures that there is no chroma subsampling enabled in the JPEG encoding process, according to the way that Photoshop Save for Web operates. I used quality 80 for this example.
Turn Optimized Off - For the purposes of this example, I think it is important to work with realistic huffman tables, not degenerate single-entry tables. Therefore I recommend that JPEG Huffman Table Optimization is left off.
Other settings: Blur off, Progressive off, ICC profile off.

Grayscale Photoshop Images

It should be noted that when you save a JPEG image from within Photoshop it always contains three components (Y, Cb, Cr). If you change the mode to grayscale (via Mode->Grayscale), the three components are still saved, even though the JPEG standard supports an image with only one component (which would be assumed to be grayscale).

What is Huffman Coding / Entropy Coding?

Huffman coding is a method that takes symbols (e.g. bytes, DCT coefficients, etc.) and encodes them with variable length codes that are assigned according to statistical probabilities. A frequently-used symbol will be encoded with a code that takes up only a couple bits, while symbols that are rarely used are represented by symbols that take more bits to encode.

A JPEG file contains up to 4 huffman tables that define the mapping between these variable-length codes (which take between 1 and 16 bits) and the code values (which is an 8-bit byte). Creating these tables generally involves counting how frequently each symbol (DCT code word) appears in an image, and allocating the bit strings accordingly. But, most JPEG encoders simply use the huffman tables presented in the JPEG standard. Some encoders allow one to optimize these tables, which means that an optimal binary tree is created which allows a more efficient huffman table to be generated.

For a reasonable explanation of how it works, please see this example of Huffman coding an ASCII string and the overview from Wikipedia.

For more details, please see my article on Optimized JPEGs - optimizing the huffman tables, particularly the first introductory sections and the section near the end titled "Standard Huffman Tables".

Decoding the JPEG Scan Data

Using JPEGsnoop

For those who are trying to understand the complex huffman decoding in a JPEG image, I'm happy to report that JPEGsnoop can now report all of the variable length code decoding for each MCU (use the Detailed Decode option). For the sample output, scroll to the bottom of this tutorial.

Decoding by Hand

The following is the decode method done by hand, which is obviously impractical for most images, but is shown here in detail to help one learn the process involved.

The above hex dump datastream shows the beginning of the Start of Scan (SOS marker 0xFFDA) marked in yellow, followed by some additional details in green and then the actual scan data selected in dark blue. Finally, the image is terminated with an End of Image (EOI marker 0xFFD9). So, the huffman-coded data content is only 9 bytes long.

Comparison of Compression File Sizes

For the sake of comparison, the original image (16 pixels by 8 pixels) contains a total of 128 pixels (2 MCUs). With 8 bits per channel (RGB), this corresponds to an uncompressed image size of 384 bytes (128 pixels x 8 bits/channel x 3 channels x 1 byte/8 bits). Clearly, using a run-length encoded format such as GIF would have produced even more image compression in examples like this (although GIF actually takes 22 bytes to code the stream because there are 16 separate runs). JPEG is not really designed to be optimized for this type of synthetic (non-organic) image.

If one uses optimized JPEG encoding, it is possible to reduce the image content size even further. In the example image, the optimized version has much smaller huffman tables (DHT) and shorter bitstrings to represent the same codewords. The net effect is that the image content size is reduced even further (to 7 bytes).

File Format	Total Size	Overhead Size	Image Content Size
BMP (Uncompressed)	440 Bytes	56 Bytes	384 Bytes
JPEG	653 Bytes	644 Bytes	9 Bytes
JPEG (Optimized)	304 Bytes	297 Bytes	7 Bytes
GIF	60 Bytes	38 Bytes	22 Bytes

Scan Data Decode

The scan data is:

FC FF 00 E2 AF EF F3 15 7F

To help resiliency in the case of data corruption, the JPEG standard allows JPEG markers to appear in the huffman-coded scan data segment. Therefore, a JPEG decoder must watch out for any marker (as indicated by the 0xFF byte, followed by a non-zero byte). If the huffman coding scheme needed to write a 0xFF byte, then it writes a 0xFF followed by a 0x00 -- a process known as adding a stuff byte.

For our extraction purposes, we will replaceme any padding bytes (0xFF00 with 0xFF):

FC FF E2 AF EF F3 15 7F

The expectation is that image content is 3 components (Y, Cb, Cr). Within each component, the sequence is always one DC value followed by 63 AC values.

For each MCU, with no chroma subsampling, we would expect the following data to be encoded:

Section	1	2	3	4	5	6
Component	Y		Cb		Cr
AC / DC	DC	AC	DC	AC	DC	AC

Note that some people get the order of the chrominance channels mixed up, and assume that it is YCrCb instead.

The figure below shows what the DCT matrix from a single MCU (8x8 pixel square) in a digital photo typically looks like. These are the entries after quantization, which has caused many of the higher-frequency components (towards the bottom-right corner of the matrix) to become zero. By the distribution of values in the frequency-domain matrix representation, it is possible to determine that the 8x8 pixel square had very little high-frequency content (i.e. it had only a gradual intensity / color change).

The DC component represents the average value of all pixels in the 8x8 MCU. Since we have deliberately created an image where all pixels in the 8x8 block are the same, we expect this value to represent either the black or white "color". The code provided in the DC entry (#0) indicates a huffman-encoded size (e.g. 1-10 bits) which is the number of bits needed to represent the average value for the MCU (eg. -511...+511).

Note that the DC component is encoded as a relative value with respect to the DC component of the previous block. The first block in the JPEG image is assumed to have a previous block value of zero.

Following the single DC component entry, one or more entries are used to describe the remaining 63 entries in the MCU. These entries (1..63) represent the low and high-frequency AC coefficients after DCT transformation and quantization. The earlier entries represent low-frequency content, while the later entries represent high-frequency image content. Since the JPEG compression algorithm uses quantization to reduce many of these high-frequency values to zero, one typically has a number of non-zero entries in the earlier coefficients and a long run of zero coefficients to the end of the matrix.

For the purposes of this tutorial, I have deliberately created an image that has constant color across all 8x8 pixels in each of the two MCU. Because there are no changes in value across each 8x8 pixel region, there is no AC (or higher frequency content) within the block. As a result, all 63 entries in the AC portion are expected to be zero (unlike the figure above). This allows us to focus on the DC component, which we do expect to change from MCU to MCU block.

The hex string shown earlier (after removal of padding bytes) can be represented in binary as the following:

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111

Extract Huffman Code Tables

Using a utility such as JPEGsnoop, you can extract the Huffman tables from the JPEG image file. Often, you will find four huffman table entries (tagged with a DHT marker):

DHT Class=0 ID=0 - Used for DC component of Luminance (Y)
DHT Class=1 ID=0 - Used for AC component of Luminance (Y)
DHT Class=0 ID=1 - Used for DC component of Chrominance (Cb & Cr)
DHT Class=1 ID=1 - Used for AC component of Chrominance (Cb & Cr)

The huffman compression tables are encoded in a somewhat confusing manner. Although you can draw out the binary tree by hand, it will be easier if you rely on a tool such as JPEGsnoop to generate all of the binary comparison strings for each huffman code in all four DHT sections.

The following four tables were extracted from the JPEG file that was created by Photoshop for the purposes of this tutorial. Other JPEG images may be reliant on different DHT tables, so it is important to extract them prior to analyzing the file. Note that turning on JPEG Optimization will create vastly different Huffman tables, with far fewer entries. For a point of comparison, the image described in this tutorial would only need optimized huffman tables of one entry each to represent our image content.

NOTE: It is important to realize that in each case the DHT entries in the JPEG file only list the Length and Code values, not the actual Bit String mapping. It is up to you to rebuild the binary tree representation of the DHT table to derive the bit strings! Please see the DHT Expansion section near the end of this tutorial for more details.

Table 1 - Huffman - Luminance (Y) - DC

Length	Bits	Code
3 bits	000 001 010 011 100 101 110	04 05 03 02 06 01 00 (End of Block)
4 bits	1110	07
5 bits	1111 0	08
6 bits	1111 10	09
7 bits	1111 110	0A
8 bits	1111 1110	0B

Table 2 - Huffman - Luminance (Y) - AC

Length	Bits	Code
2 bits	00 01	01 02
3 bits	100	03
4 bits	1010 1011 1100	11 04 00 (End of Block)
5 bits	1101 0 1101 1 1110 0	05 21 12
6 bits	1110 10 1110 11	31 41
...	...	...
12 bits	... 1111 1111 0011 ...	... F0 (ZRL) ...
...	...	...
16 bits	... 1111 1111 1111 1110	... FA

Table 3 - Huffman - Chrominance (Cb & Cr) - DC

Length	Bits	Code
2 bits	00 01	01 00 (End of Block)
3 bits	100 101	02 03
4 bits	1100 1101 1110	04 05 06
5 bits	1111 0	07
6 bits	1111 10	08
7 bits	1111 110	09
8 bits	1111 1110	0A
9 bits	1111 1111 0	0B

Table 4 - Huffman - Chrominance (Cb & Cr) - AC

Length	Bits	Code
2 bits	00 01	01 00 (End of Block)
3 bits	100 101	02 11
4 bits	1100	03
5 bits	1101 0 1101 1	04 21
6 bits	1110 00 1110 01 1110 10	12 31 41
...	...	...
9 bits	... 1111 1100 0 ...	... F0 (ZRL) ...
...	...	...
16 bits	... 1111 1111 1111 1110	... FA

Table 5 - Huffman DC Value Encoding

The following table shows how the bit fields that follow a DC entry can be converted into their signed decimal equivalent. To use this table, start with the DC code value and then extract "Size" number of bits after the code. These "Additional Bits" will represent a signed "DC Value" which becomes the DC value for that block. Note that this table applies to any JPEG file -- this table is not written anywhere in the JPEG file itself.

For example, let's assume that one was about to decompress a chrominance DC entry. If the previously-decoded "DC Code" was 05, then we must extract 5 bits following the code bits. If the next 5 bits were 00101, then this can be interpreted as decimal -26. The bits 10001 would be +17 and 11110 would be +30.

DC Code	Size	Additional Bits		DC Value
00	0			0
01	1	0	1	-1	1
02	2	00,01	10,11	-3,-2	2,3
03	3	000,001,010,011	100,101,110,111	-7,-6,-5,-4	4,5,6,7
04	4	0000,...,0111	1000,...,1111	-15,...,-8	8,...,15
05	5	0 0000,...	...,1 1111	-31,...,-16	16,...,31
06	6	00 0000,...	...,11 1111	-63,...,-32	32,...,63
07	7	000 0000,...	...,111 1111	-127,...,-64	64,...,127
08	8	0000 0000,...	...,1111 1111	-255,...,-128	128,...,255
09	9	0 0000 0000,...	...,1 1111 1111	-511,...,-256	256,...,511
0A	10	00 0000 0000,...	...,11 1111 1111	-1023,...,-512	512,...,1023
0B	11	000 0000 0000,...	...,111 1111 1111	-2047,...,-1024	1024,...,2047

Block 1 - Luminance

Luminance (Y) - DC

Referring to the Y(DC) table (Table 1), we start with the first few bits of the coded stream (1111 1100 1111...) and recognize that code x0A matches the bit string 1111 110.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 0A

This code implies that hex A (10) additional bits follow to represent the signed value of the DC component. The next ten bits after this code is 0 1111 1111 1. Table 5 above shows the DC values represented by these "additional bits" -- in this case, the bit string corresponds to a value of -512.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Value: -512

Our progress so far:

Bits	1111 1100 1111 1111 1	110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
MCU	1	???
Component	Y	???
AC/DC	DC	???
Value	-512	???

Luminance (Y) - AC

After the DC component, we begin the 63-entry AC matrix for the Y Luminance. This uses a different Huffman table (Table 2).