Contamination parameter during fit?
Created by: filipporemonato
I am having trouble understand how the contamination parameter works. In the beginning, my question was exactly the same as the one in this issue https://github.com/yzhao062/pyod/issues/144. After reading the (very useful) reply, I was left with the understanding that the contamination parameter was used only during the prediction/classification step to decide the threshold, but then I noticed that contamination is actually a parameter of the constructor of all algorithms. In addition, the docstring (for instance for KDE) reads "The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function." which makes it unclear how it's actually being used (since the threshold should not be defined during fitting, but only after scoring all samples, i.e. at the prediction/classification step).
So, is the contamination actively used during fitting? If yes, why? And how?