generate_data: no variance in the outlier variance for some values of the random_seed
Created by: vnherdeiro
I was getting weird error values for some runs out of generated data through utils.data.generate_data
. Plotting the data showed that sometimes all the outliers are distributed on the origin. See https://imgur.com/eZdfald
You can reproduce this with many values of the seed, for instance:
seed = 41
X, y = generate_data(n_features=2, contamination=5e-2, train_only=True, random_state=seed)
plt.scatter(*X.T, c=y);
print( X[y==1].var()) #==0.0
This collapse is really unwanted. The variance becomes zero and some methods suffer from numerical instability.