Skip to content

A GPT 4.0 Mini-powered chatbot that processes and summarizes resumes, with a HTML backend and flask / gunicorn frontend.

License

Notifications You must be signed in to change notification settings

mixtapeo/ResumeGPT

Repository files navigation

ResumeGPT

A GPT 4.0 Mini-powered chatbot that processes and summarizes resumes, integrated with WildApricot to pull and manage member data. It is deployed on an AWS EC2 Ubuntu instance with a Flask web server, managed using Gunicorn and Nginx.

Features

  • Summarizes resumes from WildApricot.
  • Deployable on AWS / Azure / Heroku with an iframe embed to WildApricot as custom HTML.
  • Fully managed on AWS EC2 Ubuntu with Nginx and Gunicorn.

Installation

Option I: Local environment.

I. Clone / Download the Repository

Run these commands in the root folder:

git clone https://github.com/mixtapeo/ResumeGPT
cd ResumeGPT
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

II. Create .env File

  1. Create a new file named .env in the root directory.
  2. Add the following environment variables:
wildapiricot_api_key=<YOUR_WILDAPRICOOT_API_KEY>
openai_api_key=<YOUR_OPENAI_API_KEY>
account_id=<YOUR_ACCOUNT_ID>

III. Run the Flask App

Run app.py:

python3 app.py

Then go to the IP program is running at (usually 127.0.0.1).

Option II: Running on AWS EC2.

Pre-requisites:

  1. Make an Instance: Ubuntu, type t3a.medium recommended, select a key pair, allow HTTP / S trafic.
  2. Once made, under security, add inbound rule for port 5000, on 0.0.0.0
  3. Connect to Amazon Elastic IP to get a public IP

Setting Up a New Instance

  1. Update the system and install Python virtual environment:

    sudo apt-get update
    sudo apt-get install python3-venv
  2. Clone the repository and set up the environment:

    cd /home/ubuntu
    git clone https://github.com/mixtapeo/ResumeGPT
    cd ResumeGPT
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    pip install gunicorn
  3. Create .env File:

    cat >> .env
    # Add the 3 environment variables, then Ctrl+C to exit
  4. Test Gunicorn:

    gunicorn -b 0.0.0.0:5000 app:app
    # Ctrl+C to exit
  5. Set up Gunicorn as a systemd service:

    sudo vi /etc/systemd/system/app.service

    Edit the file with the following content:

    [Unit]
    Description=Gunicorn instance for a resume gpt app
    After=network.target
    [Service]
    User=ubuntu
    Group=www-data
    WorkingDirectory=/home/ubuntu/ResumeGPT
    ExecStart=/home/ubuntu/ResumeGPT/venv/bin/gunicorn -b localhost:5000 wsgi:app
    Restart=always
    [Install]
    WantedBy=multi-user.target
    

    Save by pressing Esc -> : -> wq!

  6. Start and enable the service:

    sudo systemctl daemon-reload
    sudo systemctl start app
    sudo systemctl enable app
  7. Check if it's working:

    curl localhost:5000
  8. Install and configure Nginx:

    sudo apt-get install nginx
    sudo systemctl start nginx
    sudo systemctl enable nginx
  9. Edit the Nginx server configuration:

    sudo vi /etc/nginx/sites-available/default

    Modify it to include:

    upstream flaskapp{
        server localhost:5000;
    }
    
    location / {
           proxy_pass http://flaskapp;
           proxy_http_version 1.1;
           proxy_set_header Upgrade $http_upgrade;
           proxy_set_header Connection keep-alive;
           proxy_set_header Host $host;
           proxy_cache_bypass $http_upgrade;
         }
    

    Save by pressing Esc -> : -> wq!

  10. Check Nginx configuration validity:

    sudo nginx -t
  11. Restart Nginx and Gunicorn to apply changes:

    sudo systemctl restart nginx
    pkill gunicorn
  12. Allow port 5000 through the firewall:

    sudo ufw allow 5000/tcp

    Your EC2 virtual machine web app should now be accessible and working!

Setting Up a Cron Job

To maintain routine tasks:

  1. Download resumes, delete invalid/corrupt files, and summarize:

    Make sure the Members.xml file is in the root directory (/home/ResumeGPT).

    Example command to upload from Windows:

    scp -i newkey.pem Members.xml [email protected]:/home/ubuntu/
    Or just use WinSCP (easy, recommended).
  2. Set up the cron job:

    crontab -e

    Add the following line:

    * 6 * * * cd /home/ubuntu/ResumeGPT; source venv/bin/activate; python3 routine.py

    Check status with:

    systemctl status cron
    
    And should be working when you run this:
    crontab -l | grep -v '^#' | cut -f 6- -d ' ' | while read CMD; do eval $CMD; done

Updating the Application

To update the application with the latest code from the repository:

  1. Deactivate the virtual environment:

    deactivate
  2. Remove the existing directory:

    cd ..
    rm -rf ResumeGPT
  3. Clone the repository again:

    git clone https://github.com/mixtapeo/ResumeGPT
    cd ResumeGPT
  4. Set up the environment:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    pip install gunicorn
  5. Run the application:

    python3 app.py

Debugging

To collect running logs:

sudo tail -f /var/log/nginx/error.log
sudo tail -f /var/log/nginx/access.log

Learnings / Tech used:

  • nginx: Nginx is used as a reverse proxy to handle client connections, manage static files, and forward dynamic requests to Gunicorn. This improves the security, performance, and scalability.
  • gunicorn: Gunicorn serves as the WSGI HTTP server that handles incoming requests to your Flask application. It forks multiple worker processes to manage these requests concurrently, making it a critical component in a production environment.
  • CORS
  • CRON
  • Ubuntu
  • AWS EC2
  • AWS Elastic IP Addresses
  • HTML
  • JavaScript
  • Python
  • ChatGPT API
  • WildApricot API
  • Amazon Machine Images (AMI)
  • SSH

Future TODOs:

  • Batch Translating: Investigate batch translating as some members are missing when using ChatGPT completions for summarizing. Average tokens sent for summary are ~220K, so batch processing may be more efficient.
  • HTTPS / iframe embed: (TLDR; HTTPS setup required) Cannot iframe embed into wildapricot, as currently without SSL cerificate, can't make site HTTPS, which is required to be embeded according to WildApricot. Suggestions: install SSL certificate by buying a domain or investigate hosting code on Amaazon AppRunner or Google equivalent (google run seems to be easier).

App Flows

Current Web App Flow.

Look at older flow below if using in local environment.

[old, initial draw up proposal]

III: Future TODOs:

Drawback: Look into batch translating. Some people are missing when using multithreading chat completions GPT for summarising. Also chat completions will be unreliable in the future. Avg tokens sent for summary are ~220K. Batch will be better.
Automating resumeCache and downloading resumes. Currently doesnt do this, have to manually run gpt.py.

App Flow:

About

A GPT 4.0 Mini-powered chatbot that processes and summarizes resumes, with a HTML backend and flask / gunicorn frontend.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published