Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Issues
  • #157
Closed
Open
Issue created Jan 09, 2020 by Administrator@rootContributor

Any better way for choosing threshold?

Created by: kdlin

According to the documentation, the threshold is "n_samples * contamination" most abnormal samples in "decision_scores_". While it is useful in some applications, I wonder if one can do better for general uses. In other words, randomly picking X samples with the highest decision_scores_ and ignoring the scores themselves may not be the best idea.

As an example, say we have 1,000 samples and the contimination is 0.1. It looks like that it will pick 10 samples as the outliers, regardless of the actual decision_scores_ of these 1,000 samples. If 999 of them have low scores and only 1 has high score, it still picks 10 samples.

Assignee
Assign to
Time tracking