Skip to content

Experimentation

Isabelle Eysseric edited this page Sep 10, 2024 · 6 revisions

Training, Validation and Testing

A CNN classifier is used for object recognition, more specifically the pre-trained ResNet 34 in order to have better results.

For the parameters, we go with an SGD optimizer that implements stochastic gradient descent with a momentum of 0.9 and a learning rate of 0.001. We also go with a learning rate adjustment based on the number of epochs, here 10.

We do training on 10 epochs to have a better result knowing that for one epoch we do training in batches of 32 which includes learning, gradient descent and learning rate adjustment.

Then we do validation in batches of 32 also with only learning. At the end of these two steps, we compare the models and keep the best one. Then we continue on the next epochs in the same way and at the end of these ten epochs, we save the best model, that is to say the one that generalizes best on new data.



Preliminary training

So here we have a graph representing the values ​​of the loss function at the end of each of the five epochs or the five iterations of the preliminary learning phase. We can notice that the loss on the training data was already at a satisfactory level at the end of the second epoch while the loss on the validation data requires three epochs yet the satisfactory level. We notice that the error in validation increases on the last iteration, which looks like the beginning of overfitting considering that the loss of the training data does not vary.



Figure: Loss Function vs Epochs Graph

Here we have a graph representing the accuracy values ​​at the end of the same iterations as the previous slides. The results are a mirror of the loss function values. We still have satisfactory accuracy values ​​in training from the second epoch and we also have an accuracy in validation that is suitable for the 3rd and 4th epochs. We see just before that there is overfitting from the fifth epoch as you can see. Given the fact that the accuracy in validation decreases while that in training remains rather constant. But it's not really serious since as Isabelle explained earlier, we recover the parameters therefore the weights and biases of the model for the iteration which has the best accuracy in validation.


Figure: Accuracy vs Epochs Graph



So here we will continue the calculations with the model of the 4th epoch given the results we have for this iteration.



Results of the preliminary training

To properly evaluate the network's performance, we will try to determine the performance indicators on a set of tests.

So as a reminder, we separated the dataset into 3 subsets with 75% of the instances for training, 15% for validations and 10% for the test. So given the huge size of the dataset, we will have enough instances to collect interesting performance indicators on the test set. We want to collect three indicators per class: precision, recall and F-Measure. Precision is a measure of efficiency, it allows us to determine whether all the labeled data of a class actually come from this same class. We have the recall which is a measure of precision in the data, it allows us to see if all the data belonging to the same class are well labeled by the model. It is possible that precision and recall do not agree, maximum precision for a minimum recall and vice versa. So we have a measure called F-Measure which is a combination of the two indicators and which will allow us to optimize precision without neglecting recall for example. All these indicators will be compared with the support, that is, on the number of data actually belonging to each class in the test subset.


Figure: Graph on training accuracy


Here we can see the support data in blue and the precision in red. We notice for classes between 50 and 225 data, the precision is always around 1 so there is no problem in this sense. The model does not report too many false positives.


Figure: Graph on recall in training


Same observation at the recall level. There are not too many false negatives in the classification on German signs.


Figure: Graph on the F-Measure in training


And so of course if we have good results for precision and recall then we always have good results for the F-Measure.



Transfer learning

We now come to the purpose of the project, namely the transfer learning of a network that can recognize and classify Quebec traffic signs.

The Quebec dataset is imported and undergoes the same preprocessing as the German data. The network trained on the German panels is then modified so that the fully connected output layer no longer contains inovan but contains 25 since we are going from 10 German classes to 25 Quebec classes.

The parameters of the previous layers have been frozen, which means that the feature extraction would be performed with the layers trained with the German data while the classification will be accomplished with the output layer. Finally, for training, we will use the same parameters as with German data. So we will always have 5 iterations for training, we will have batches of 32 examples. We will do an optimization with a gradient descent algorithm and we will use a cross-entropy loss function. Finally, to avoid having problems with training, we will do a stepwise learning rate planning.


Figure: Graph of the result of the loss of the learning transfer.


Here is the graph of the values ​​of the loss function for each of the five epochs. We notice that the loss function is constantly decreasing and that as always the loss in training is lower than the loss in validation.


Figure: Graph of the result of the accuracy of the learning transfer.


As ​​expected, the accuracy is the opposite of the loss function. As much as the losses were decreasing, the accuracy is increasing. Similarly, the accuracy in training is greater than that in validation, especially for the last epochs where we reach almost 80% accuracy in training and 70% in validation.

Here we could have had more epochs to have better results in accuracy but you will see that we can already determine some relationships in terms of performance without having to do more learning.


Figure: Graph of the precision performance of the transfer learning.


Indeed, when we look at the precision performance, we notice that the majority of classes where there is good precision are those where the number of examples in testing was significant while the classes with a low number of instances are those that had the worst precision values. We consider that the subsets of training and validation tests have the same two-class distributions. Given that these subsets were generated randomly, which means that neural networks predict a significant number of false positives.


Figure: Transfer learning recall performance graph.


But when we look at the recall performance, we notice that it's not bad at all. For a given class, the model will predict only false negatives or only true positives. We will notice that the classes where the model predicts false negatives are the same ones where there were a lot of false positives in the previous slide.


Figure: F-Measure performance graph of transfer learning.


And so in terms of measurements, we have a general view of the problem. We can really see that a sufficient number of data in a class prevents poor performance. We can even establish a critical threshold from which a number of data in a class would guarantee that it would perform well. This really shows that more Quebec data is needed for a model previously trained with external data to perform well on Quebec roads. In particular, considering that such a model must be able to generalize on a multitude of signs specific to the Quebec context.