https://doi.org/10.1351/goldbook.11420
An artificial correlation that can arise when too many properties are screened relative to the number of available observations.
Example: If one tests 20 possible random descriptors for statistical significance in a multiple regression equation of the properties of 15 compounds, the average fitted \(R^{2}\) is \(\pu{0.81}\) even though an average of only four descriptors were included in the equation.