Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 856 Bytes

README.md

File metadata and controls

15 lines (10 loc) · 856 Bytes

mlb-game-prediction

thesis code

University of Pennsylvania

EAS 499, Senior Capstone Thesis Andrew Cui Advisor: Dr. Shane T. Jensen


We use these models in a predictive analysis of Major League Baseball games, extracting data from Retrosheet logs and performing extensive data wrangling, preprocessing and feature engineering to identify smart covariates to use. We targeted binary classification of whether a game would be won by the home team or not.

Overall, the logit elastic net model scored an accuracy of 61.77%, exceeding our naive classifiers and many examples from the literature. This repository details the code bank that was used in analysis, including relevant charts and graphics used.

Further detail about the analytical approach can be found in the paper itself. Please direct questions to Andrew Cui ([email protected])