A mini Project of SC1015 (Introduction to Data Science and Artificial Intelligence) aiming to predict winners of Oscars. Please refer to our code in the following order:
- Learning_TMDB_API.ipynb
- Data_Extraction.ipynb
- Exporting_Oscar_Winners.ipynb
- Data_Cleaning.ipynb
- Data_EDA.ipynb
- Data_Anaylsis.ipynb
- Resampling.ipynb
Video Presentation Link
- @notsuspiciousindividual
- @Maarrttiinn
- @lcwlouis
- Which model best predict whether a movie would win Oscars?
- Decision Tree
- K-Nearest Neighbour (kNN)
- Random Forest
- RandomOverSampler together with Random Forest Classification gave the best overall accuracy but low True Positive Rate
- ADASYN resampling with KNN gave decent overall accuracy and True Positive Rate of above 70%
- KNN classification generally gave consistently high True Positive Rate of above 70%
- API Usage
- Collaborating using GitHub
- Handling imbalanced datasets using over-sampling techniques (RandomOverSampler, SMOTE, ADASYN)
- Using SimpleImputer to fill in missing numerical variables with median
- Trying different models for analysis like Logistic Regression, Naive Bayes, Support Vector Machine
- https://developers.themoviedb.org/3/getting-started/introduction
- https://github.com/nicklimmm/movie-analysis
- https://www.themoviedb.org/talk/621b62abd18572001df182ea
- https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
- https://towardsdatascience.com/7-over-sampling-techniques-to-handle-imbalanced-data-ec51c8db349f