There are three criteria for choosing the best model to be used for this project. Although accuracy is essential, the other two aspects, precision and recall are also being considered as the model deals with the binary classification that will give results whether the student has a mental health-related problem.
Recall is the ratio of the number of events you can correctly recall to a number of all correct events, while precision is the ratio of a number of events you can correctly recall to a number of all events you recall (mix of both correct and wrong recalls).
Based on the finding, The Random Forest-based model is the most accurate model with 73% accuracy, followed by Decision Tree (72.52%), KNN(66.92%), GBM(65.76%), SVM(64.56%), and Naïve Bayes (56%). The precision for Random Forest is also the best, same percentage with Decision Tree at 73%, followed by KNN(68%), GBM(66%), SVM(65%), and Naïve Bayes (62%). Random Forest-based model also tops the recall percentage, which is important for health-related binary classification model with 73%, the same percentage as the Decision Tree model. The percentages for other models are as follow; KNN(67%), GBM(66%), SVM(65%), and Naïve Bayes (56%).
In conclusion, the Random Forest model is the best model for predicting mental health problems amongst high school students in Malaysia.