Web page 18 ofFig. 11 Parity plots showing the misclassification distribution in classification-via-regression experiments
Web page 18 ofFig. 11 Parity plots displaying the misclassification distribution in classification-via-regression experiments with SNIPERs drug reference to the half-lifetime values for a KRFP/SVM, b KRFP/trees, c MACCSFP/SVM, d MACCSFP/trees, e KRFP/SVM, f KRFP/trees, g MACCSFP/SVM, h MACCSFP/trees. The figure presents variations between accurate and predicted metabolic stability classes in the class assignment activity performed primarily based on the precise predicted value of half-lifetime in regression studiescompound representations inside the classification models occurs for Na e Bayes; having said that, it really is also the model for which there is the lowest total number of properly predicted compounds (much less than 75 on the complete dataset). When regression models are compared, the fraction of correctly predicted compounds is greater for SVM, even though the amount of compounds appropriately predicted for each compound representations is comparable for each SVM and trees ( 1100, a slightly higher number for SVM). One more style of prediction correctness evaluation was performed for regression experiments with the use of the parity plots for `classification via regression’ experiments (Fig. 11). Figure 11 indicates that there is no apparent correlation between the misclassification distribution plus the half-lifetime values as the models misclassify molecules of both low and high stability. Analogous evaluation was performed for the classifiers (Fig. 12). One particular general observation is that in case of incorrect predictions the models are more probably to assign the compound towards the neighbouring class, e.g. there is greater NTR1 drug probability on the assignment ofstable compounds (yellow dots) to the class of middle stability (blue) than for the unstable class (red). For compounds of middle stability, there’s no direct tendency of class assignment when the prediction is incorrect–there is similar probability of predicting such compounds as stable and unstable ones. In the case of classifiers, the order of classes is irrelevant; as a result, it is highly probable that the models through education gained the capability to recognize trustworthy options and use them to properly sort compounds in line with their stability. Evaluation in the predictive energy of the obtained models enables us to state, that they are capable of assessing metabolic stability with high accuracy. This is crucial simply because we assume that if a model is capable of producing right predictions regarding the metabolic stability of a compound, then the structural capabilities, that are made use of to produce such predictions, might be relevant for provision of preferred metabolic stability. Thus, the created ML models underwent deeper examination to shed light on the structural things that influence metabolic stability.Wojtuch et al. J Cheminform(2021) 13:Web page 19 ofFig. 12 Evaluation in the assignment correctness for models educated on human information: a Na eBayes, b SVM, c trees, d Na eBayes, e SVM, f trees. Class 0–unstable compounds, class 1–compounds of middle stability, class 2–stable compounds. The figure presents the distribution of probabilities of compound assignment to particular stability class, according to the accurate class value for test sets derived from the human dataset. Every dot represent a single molecule, the position on x-axis indicates the right class, the position on y-axis the probability of this class returned by the model, and the colour the class assignment based on model’s predictionAcknowledgements The study was supported by the National Scien.