Skip to content

Reducing the size of a file/ compressing the file using Huffman's string compression algorithm, implemented for a college project.

Notifications You must be signed in to change notification settings

Pranav016/Huffman-Coding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman Coding

Huffman-Coding

Huffman coding is a lossless data compression algorithm. In this algorithm, a variable-length code is assigned to input different characters. The code length is related to how frequently characters are used. Most frequent characters have the smallest codes and longer codes for least frequent characters

We create a frequency hashmap for all the characters in the string and then use data structures such as Min-Heap and Tree to convert these into binary codes. These binary codes are to be stored as bytes hence they are padded with zeros (to avoid default padding snce 1 byte=8 bits). This file is now compressed and has lesser size.

Data Structures used-

  • Tree
  • HashMap
  • Min Heap

Theory

Step-1:

  • Make a frequency table for all the characters in the string.

Step-2:

  • Pick the two smallest frequencies and make their nodes in the tree.
  • Make another nodes with their combined frequencies and add it to the tree as well as to the frequency table.
  • Connect the two nodes to this node with combined frequencies.

Step-3:

  • Again pick two smallest frequencies and continue the process till all the frequencies in the table are used.

Step-4:

  • Add weights to the branches of the Tree. Add weight 0 to all the left branches and 1 to all the right branches.

Step-5:

  • We can then get a code for each character just by following the path on the resultant tree.
  • Make a map using this.

Step-6:

  • We then pad these codes with zeros to avoid default padding as we have to store these as bytes and 1 byte = 8bits.

Step-7:

  • We now convert the binary codes to bytes using int(byte, 2) and store it in the bytes array.

Decompression of codes:

  • We remove the padding and decode the binary codes.
  • We then use the reverse_mapping hashmap to convert the codes back to text.

Output:

  • When we run the useHuffman.py, two files are created.
  1. sample.bin -> binary file with less size
  2. sample_decompressed.txt -> decompressed text file
sample.txt img sample.bin img sample_decompressed.txt img

Environment Setup and Local Installation:

  1. Drop a ⭐ on the Github Repository.

  2. Make sure to install python on your system- https://www.python.org/downloads/

  3. Download Python IDE or code editor for python code

  1. Clone the Repo by going to your local Git Client and pushing this command:
    git clone https://github.com/Pranav016/Huffman-Coding.git

  2. Clone the Repo to a particular folder on your system by going to your local Git Client and pushing this command:
    git clone https://github.com/Pranav016/Huffman-Coding.git <folder-name>

  3. Open the project in the Jupyter Notebook/VS code to use it.
    Entry file- useHuffman.py

About

Reducing the size of a file/ compressing the file using Huffman's string compression algorithm, implemented for a college project.

Topics

Resources

Stars

Watchers

Forks

Languages