Huffman coding deals with data compressions of ASCII characters. This is one of many techniques for compressing data. This method is most commonly used for emails over the internet.
Coding
Let's see an example of the Huffman code with the following unsorted set of data:
STEP 1: Compute the probability of each character.
STEP 2: Sort the set of data in ASCENDING order.
STEP 3: Create a new node where the left child is the lowest in the sorted list and the right child is the second lowest in the sorted list.
STEP 4: Chop-off those two elements in the sorted list as they are now part of one node and add the probabilities. The result is the probability for the new node.
STEP 5: Perform insertion sort on the list.
FINAL RESULT after the repeats:
Tree
Now that there is one node remaining, simply draw the tree:
With the above tree, you place a 0 on each path going to the left and a 1 on each path going to the right. So the code for each letter here in the above will be:
Entropy
The entropy of the code is the average number of bits needed to decode a given pattern. It is calculated as follows:
which means the sum from 1 to n (the number of characters) of the number of bits times the probability of that character. Given the below chart, it shows the entropy:
It is possible that the total entropy is more than 1. It is NOT possible that the total
probability is more than 1 by the basic rules of probability.