"Heart Attack Analysis with Neural Network"
Description of data attributes:
- Age : Age in years
- Sex: Sex (0 : Female, 1: Male)
- Cp: Chest pain (1 = typical angina; 2 = atypical angina; 3 = non-anginal pain; 4 = asymptomatic)
- Trtbps: Resting blood pressure
- Chol: Serum cholestoral in mg/dl
- Bs: Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
- Restecg: Resting electrocardiographic results (0 = normal; 1 = having ST-T; 2 = hypertrophy)
- Thalachh: Maximum heart rate achieved
- Exng: Exercise induced angina (1 = yes; 0 = no)
- Oldpeak: ST depression induced by exercise relative to rest
- Slp: The slope of the peak exercise ST segment (1 = upsloping; 2 = flat; 3 = downsloping)
- Caa: The number of major vessels (0-3) colored by flourosopy
- Thall : 3 = normal; 6 = fixed defect; 7 = reversable defect
- Output : The predicted attribute - diagnosis of heart disease (0= less chance of heart attack 1= more chance of heart attack)
Each attribute visualizes in below.
Figure 1.1
Heat map for attributes
Figure 1.2
We have seen clearly that chest pain type are a positive relationship with heart rate.
Output labels are unbalanced and '1' labels have the highest samples than '0' labels. In Figure 1.3, it can be seen it clearly.
Figure 1.3
Labels are unbalanced so we use the over- resampling method to solve this problem. Thus, each label has the same sample size. You can see it in Figure 1.4.
Figure 1.4
After resampling method, we used the z-normalization technique in order to scale the values. Thus, the variance is one and the standard deviation is 0.We used the cross-validation technique to test the effectiveness of machine learning models. We split the data into the proportion of 80% train and 20% test.
We used a sequential neural network model. We have different fully connected layers that are the size of 32,64,128,128, and 1 respectively. We compiled the model using adam optimizer, loss function and accuracy metric. We benefited from the early-stopping function to stop training when accuracy is not improved 20 consecutive times. We reduced the learning rate when a metric has stopped improving 5 consecutive times. For the fitting model, we used 28 batch size and 200 epochs. Also, the model evaluated the test data. The accuracy is 93.93 in Figure 1.5.
Figure 1.5
The confusion matrix which has this accuracy is in Figure 1.6.
Figure 1.6
Lastly, performance metrics are in the below table.Table 1.0
Accuracy | F1 Score | Precision | Recall |
---|---|---|---|
93.93 | 58.10 | 78.94 | 63.64 |