课 15, 主题 8

In Progress

← Previous Next→

Compression Huffman Coding Aqa

课 Progress

0% Complete

Exam code:8525

Huffman Coding

What is Huffman coding?

Huffman coding is a method of lossless compression primarily used on text based data (documents)
A Huffman coding tree is used to compress the data whilst keeping all the data so that it can be uncompressed back to its original state

Example (step 1)

A text file contains the following characters

Text file
I		L	O	V	E		C	O	M	P	U	T	E	R	S

The text file contain 16 characters including spaces
A frequency analysis shows there is some repetition in the characters, for example there are 2 x E’s

Frequency
I	space	L	O	V	E	C	M	P	U	T	R	S
1	2	1	2	1	2	1	1	1	1	1	1	1

Huffman Trees

What is a Huffman tree?

A Huffman tree is a type of binary tree used in Huffman coding to achieve lossless compression of text data
Each character is represented as a leaf in the tree
Internal nodes always have two children, while leaves have none
The tree is built based on the frequency of characters in the text, so that more frequent characters are given shorter codes
Binary trees are covered in much more detail at A-Level

Example (step 2)

List the frequency of each character and order them from least to most frequent:

Table with two rows; the first row has the letters "I L V C M P U T R S" and "space O E," and the second row numbers "1 1 1 1 1 1 1 1 1 1 2 2 2".

Combine the two least frequent characters (I + L = 2) into a new node, then reorder:

huffman-coding-image1

Update the frequency data to show the combined characters

Table with letters "V C M P U T R S I L space O E" in the first row and corresponding numbers "1 1 1 1 1 1 1 1 2 2 2 2 2" in the second row.

Repeat until all characters are combined into a single tree

huffman-coding-image2

Assigning bit patterns
- Follow each branch of the tree, writing 0 on left branches and 1 on right branches
- Each character ends up with a unique binary code

huffman-coding-image3

More frequent characters get shorter codes

Character	Bit pattern	Times used	Total bits
space	`000`	2	3*2 = 6
O	`001`	2	3*2 = 6
E	`010`	2	3*2 = 6
I	`0110`	1	4*1 = 4
L	`0111`	1	4*1 = 4
V	`1000`	1	4*1 = 4
C	`1001`	1	<p

Responses