cstutoringcenter.com logo
E-mail:Password:

Become a premier member today to gain access to exclusive
member benefits! Just $5.00 to join FOR LIFE!






Huffman coding deals with data compressions of ASCII characters. This is one of many techniques for compressing data. This method is most commonly used for emails over the internet.

Coding

Algorithm for Huffman coding:
1.	Compute the probability of each character.
2.	Sort the set of data in ASCENDING order.
3.	Create a new node where the left child is the lowest 
        in the sorted list and the right is the second lowest 
        in the sorted list.
4.	Chop-off those two elements in the sorted list as they 
        are now part of one node and add the probabilities.  
        The result is the probability for the new node.
5.	Perform insertion sort on the list with the new node.
6.	REPREAT STEPS 3,4,5 UNTIL you only have 1 node left.

Let's see an example of the Huffman code with the following unsorted set of data:

STEP 1: Compute the probability of each character.

A (.4)  B (.05)  C (.1)  D (.1)  E (.2)  F (.02)  G (.08)  H (.05)

STEP 2: Sort the set of data in ASCENDING order.

F (.02)  B (.05)  H (.05)  G (.08)  C (.1)  D (.1)  E (.2)  A (.4)

STEP 3: Create a new node where the left child is the lowest in the sorted list and the right child is the second lowest in the sorted list.

F (.02)  B (.05) | H (.05)  G (.08)  C (.1)  D (.1)  E (.2)  A (.4)
   \        /
    \      /
       FB

STEP 4: Chop-off those two elements in the sorted list as they are now part of one node and add the probabilities. The result is the probability for the new node.

FB (.07) | H (.05)  G (.08)  C (.1)  D (.1)  E (.2)  A (.4)

STEP 5: Perform insertion sort on the list.

H (.05)  FB (.07)  G (.08)  C (.1)  D (.1)  E (.2)  A (.4)

FINAL RESULT after the repeats:

A (.4)  DHFBGCE (.6)

Tree

Now that there is one node remaining, simply draw the tree:

            F   B
            0\ /1
         H   FB  G   C
         0\  /1  0\ /1
       D  HFB     GC    E
       0\  /1     0\  /1
        DHFB       GCE
          0\      /1
A          DHFBGCE
 \          /
  \        /
   0      1
    \    /
     \  /
      \/

With the above tree, you place a 0 on each path going to the left and a 1 on each path going to the right. So the code for each letter here in the above will be:

Letter	Code
A	0
B	10111
C	1101
D	100
E	111
F	10110
G	1100
H	1010

Entropy

The entropy of the code is the average number of bits needed to decode a given pattern. It is calculated as follows:

        n
E =    	∑  bitsi * probi
       	i=1

which means the sum from 1 to n (the number of characters) of the number of bits times the probability of that character. Given the below chart, it shows the entropy:

Char	Code	Bits	Prob	Entropy
A	0	1	.4	.4
B	10111   5	.05	.25
C	1101	4	.1	.4
D	100	3	.1	.3
E	111	3	.2	.6
F	10110	5	.02	.1
G	1100	4	.08	.32
H	1010	4	.05	.2
________________________________________
				2.57 -> Total Entropy

It is possible that the total entropy is more than 1. It is NOT possible that the total probability is more than 1 by the basic rules of probability.