Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Issues
  • #456
Closed
Open
Issue created Nov 23, 2022 by Administrator@rootContributor

Contamination parameter during fit?

Created by: filipporemonato

I am having trouble understand how the contamination parameter works. In the beginning, my question was exactly the same as the one in this issue https://github.com/yzhao062/pyod/issues/144. After reading the (very useful) reply, I was left with the understanding that the contamination parameter was used only during the prediction/classification step to decide the threshold, but then I noticed that contamination is actually a parameter of the constructor of all algorithms. In addition, the docstring (for instance for KDE) reads "The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function." which makes it unclear how it's actually being used (since the threshold should not be defined during fitting, but only after scoring all samples, i.e. at the prediction/classification step).

So, is the contamination actively used during fitting? If yes, why? And how?

Assignee
Assign to
Time tracking