Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Issues
  • #482
Closed
Open
Issue created Mar 05, 2023 by Sarah@SaVoAMP

Using IForest in situations where training set does not contain any anomalies

Hey,

I was reading the original paper on Isolation Forests. There the authors state, that

iForest also works well in high dimensional problems ... and in situations where training set does not contain any anomalies.

Also they write in the "Empirical Evaluation" part that

It is assumed that anomaly labels are unavailable in the training stage. Anomaly labels are only available in the evaluation stage to compute the performance measure, AUC.

and the paper contains a section called "Training using normal instances only".

Since I was also trying to train an Isolation Forest without any anomalies, I was wondering why the contamination parameter of the IForest model needs to be in the interval (0., 0.5], where the case of zero anomalies is excluded. I tried to work around the problem by setting a very small value (close to 0) for the contamination. But then another problem arises:

MicrosoftTeams-image

Is there a way around the problem so that it is possible to train exclusively on normal data that does not contain any anomalies?

Assignee
Assign to
Time tracking