Huffman Coding

Huffman coding is a lossless data compression algorithm. In this algorithm, a variable-length code is assigned to input different characters. The code length is related to how frequently characters are used. Most frequent characters have the smallest codes and longer codes for least frequent characters

We create a frequency hashmap for all the characters in the string and then use data structures such as Min-Heap and Tree to convert these into binary codes. These binary codes are to be stored as bytes hence they are padded with zeros (to avoid default padding snce 1 byte=8 bits). This file is now compressed and has lesser size.

Data Structures used-

Tree
HashMap
Min Heap

Theory

Step-1:

Make a frequency table for all the characters in the string.

Step-2:

Pick the two smallest frequencies and make their nodes in the tree.
Make another nodes with their combined frequencies and add it to the tree as well as to the frequency table.
Connect the two nodes to this node with combined frequencies.

Step-3:

Again pick two smallest frequencies and continue the process till all the frequencies in the table are used.

Step-4:

Add weights to the branches of the Tree. Add weight 0 to all the left branches and 1 to all the right branches.

Step-5:

We can then get a code for each character just by following the path on the resultant tree.
Make a map using this.

Step-6:

We then pad these codes with zeros to avoid default padding as we have to store these as bytes and 1 byte = 8bits.

Step-7:

We now convert the binary codes to bytes using int(byte, 2) and store it in the bytes array.

Decompression of codes:

We remove the padding and decode the binary codes.
We then use the reverse_mapping hashmap to convert the codes back to text.

Output:

When we run the useHuffman.py, two files are created.

sample.bin -> binary file with less size
sample_decompressed.txt -> decompressed text file

Environment Setup and Local Installation:

Drop a ⭐ on the Github Repository.
Make sure to install python on your system- https://www.python.org/downloads/
Download Python IDE or code editor for python code

Clone the Repo by going to your local Git Client and pushing this command:
git clone https://github.com/Pranav016/Huffman-Coding.git
Clone the Repo to a particular folder on your system by going to your local Git Client and pushing this command:
git clone https://github.com/Pranav016/Huffman-Coding.git <folder-name>
Open the project in the Jupyter Notebook/VS code to use it.
Entry file- useHuffman.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
demo		demo
.gitignore		.gitignore
HuffmanCoding.py		HuffmanCoding.py
README.md		README.md
sample.txt		sample.txt
useHuffman.py		useHuffman.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Huffman Coding

Data Structures used-

Theory

Step-1:

Step-2:

Step-3:

Step-4:

Step-5:

Step-6:

Step-7:

Decompression of codes:

Output:

Environment Setup and Local Installation:

About

Languages

Pranav016/Huffman-Coding

Folders and files

Latest commit

History

Repository files navigation

Huffman Coding

Data Structures used-

Theory

Step-1:

Step-2:

Step-3:

Step-4:

Step-5:

Step-6:

Step-7:

Decompression of codes:

Output:

Environment Setup and Local Installation:

About

Topics

Resources

Stars

Watchers

Forks

Languages