https://doi.org/10.1351/goldbook.11505
Evaluation of the robustness of a QSAR or classification model by repeatedly randomizing the target property of compounds, developing models based on this
randomized property, and comparing the statistics of fit of the scrambled models with those of the true model, which should be superior.