Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Issues
  • #129
Closed
Open
Issue created Jul 26, 2019 by Administrator@rootContributor

HBOS (and probably others) model doesn't need decision_scores_ and labels_ attributes

Created by: Fed29

Hello, I'm working on a scenario where I've to train HBOS model on past data, save model and then use that to found anomalies on new (unseen) data. So during the HBOS training I don't need to save the decision_scores_ and labels_ attributes of the training data inside the model obj.

My suggestion is to skip the decision_scores_ and labels_ attributes initialization and if a user needs them he will be able to run predict on training data (and save them in another structure outside the model).

This new approach enable us to save very little model (in term of memory). Here is an example (HBOS trained on 600k data) decision_scores_ are 5MB as well as labels_ -> 10MB pkl file Removing them you'll have a 2KB pkl file

Furthermore:

  • labels_ is used only in the fit_predict method which is deprecated.
  • in the predict method there's check_is_fitted(self, ['decision_scores_', 'threshold_', 'labels_']) but neither decision_scores_ or labels_ are used.
Assignee
Assign to
Time tracking