Model Development. Different aspects of model development for SNNs were considered: 1) whether the hyperparameters were tuned and which was the performance criterion for model development. 2) how the prognostic variables were scaled, 3) which programming language was used. Hyperparameters are fundamental to the architecture of an ▇▇▇. They fine-tune the performance of a prediction model, preventing overfitting and providing generalizability of the model to new ”unseen” data. Choice of hyper- parameters can be a challenge in the modern era of building SNNs with state-of-the-art software that allows for numerous choices. Commonly tuned parameters were penalty terms in the likelihood (e.g., weight decay) and the number of units (nodes) in the hidden layer(s). In the majority of studies (15, 62.5%), the approach to training hyperparameters was unclear, with 6 of these studies (25.0%) failing to report whether parameters were tuned or default values were chosen. In 4 studies (16.7%) parameters were tuned, in 3 studies (12.5%) some parameters were tuned and some were assigned default values, while in 2 studies (8.3%) default values only were chosen for the hyperparameters. The performance criterion for model development (hyperparameter tuning) was examined across the 24 studies. The training criterion was unclear for 6 studies (25.0%). For 5 studies (20.8%), neural network hyperparameters were trained based on the log-likelihood, for 3 studies based on the C-index (12.5%), and for 2 studies (8.3%) based on the Area Under the Curve (AUC). Other criteria used for model development are provided in the Supplementary Material. Better reporting of the choice of hyperparameters (which parameters were selected) and of the training procedure (how they were tuned) is needed. This will help researchers to better understand how the model was developed and will facilitate reproducibility. In ANNs, input features are typically scaled to ensure that all features have a comparable scale, which allows an update of the same rate, resulting in faster algorithm convergence. The procedure was unclear in 10 of the 24 studies (41.7%), scaling was unnecessary in 7 studies (29.2%), and normalization (minimum and maximum values of features are used for scaling) was applied in 5 studies (20.8%). Standardization (mean and standard deviation of features are used for scaling) was applied in only 2 studies (8.3%). A precise description of the scaling approach (normalization or standardization) should be provided by researchers. The programming language used for the development of the ▇▇▇ was unclear in 7 studies (29.2%). Python was employed in 4 (16.7%) and R in 2 (8.3%) of the more recent studies. In the previous decades, Matlab was used 3 times (12.5%), NeuralWare 3 (12.5%), S-plus 3 (12.5%), while Epilog Plus and PlaNet were used 1 time each (4.2%). There is a trend towards employing Python, utilizing the keras and Theano libraries, which can build state-of-the-art ANNs with multiple options for layers, optimisers and error (loss) functions. These two libraries also have an interface available to the R programming language. It is strongly encouraged to share code developed for new methodologies or applications of existing methodologies in publicly available repositories (e.g., GitHub) to support reproducability and good clinical practice.
Appears in 2 contracts
Sources: Analysis of Sarcoma and Non Sarcoma Clinical Data With Statistical Methods and Machine Learning Techniques, Analysis of Sarcoma and Non Sarcoma Clinical Data With Statistical Methods and Machine Learning Techniques