Some useful definitions:
• Code word: Encoding a text that comprises n characters from some alphabet by assigning to each of the text’s characters some sequence of bits. This bits sequence is called code word
• Fixed length encoding: Assigns to each character a bit string of the same length.
• Variable length encoding: Assigns code words of different lengths to different characters.
How can we tell how many bits of an encoded text represent ith character?
We can use prefix free codes
• Prefix free code: In Prefix free code, no codeword is a prefix of a codeword of another character.
• Binary prefix code :
o The characters are associated with the leaves of a binary tree.
o All left edges are labeled 0
o All right edges are labeled 1
o Codeword of a character is obtained by recording the labels on the simple path from the root to the character’s leaf.
o Since, there is no simple path to a leaf that continues to another leaf, no codeword can be a prefix of another codeword
• Constructs binary prefix code tree
• By David A Huffman in 1951.
• Huffman’s algorithm achieves data compression by finding the best variable length binary encoding scheme for the symbols that occur in the file to be compressed. Huffman coding uses frequencies of the symbols in the string to build a variable rate prefix code
o Each symbol is mapped to a binary string
o More frequent symbols have shorter codes
o No code is a prefix of another code
• Huffman Codes for Data Compression achieves 20-90% Compression
Step 1: Initialize n one-node trees and label them with the characters of the alphabet.
Record the frequency of each character in its tree’s root to indicate the tree’s weight. (More generally the weight of a tree will be equal to the sum of the frequencies in the tree’s leaves)
Step 2: Repeat the following operation until a single tree is obtained.
“Find two trees with smallest weight. Make them the left and right sub-tree of a new tree and record the sum of their weights in the root of the new tree as its weight”
Construct a Huffman code for the following data: