Question regarding precision@n and roc(@n?)
Created by: Henlam
Hello,
first and foremost, thank you for building this wrapper it is of great use for me and many others.
I have question regarding the evaluation: Most outlier detection evaluation settings work by setting the ranking number n equal the number of outliers (aka contamination) and so did I in my experiments.
My thought concerning the ROC and AUC score was:
- Don't we have to to rank the outlier scores from highest to lowest and evaluate ROC only on the n numbers. Thus, needing a ROC@n curve?
- Why do people use ROC and AUC for outlier detection problems which by nature are heavily skewed and unbalanced. Hitting a lot of true negatives is easy and guaranteed, if the algorithms knows that there only n numbers of outliers.
In my case the precision@n of my chosen algorithms are valued in the range of 0.2-0.4 because it is a difficult dataset. However, the AUC score is quite high at the same.
I would appreciate any thoughts on this since I am fairly new to the topic and might not grasp the intuition of the ROC curve for this task.
Best regards
Hlam