Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • H headless-chrome-crawler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 29
    • Issues 29
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • yujiosaka
  • headless-chrome-crawler
  • Issues
  • #343
Closed
Open
Issue created Mar 18, 2019 by Mike Rispoli@mrispoli24

Pages with 403 errors not throwing errors

What is the current behavior?

When you crawl a page that throws a 403 unauthorized error the crawler just hangs and stays there indefinitely. It ignores all timeouts and doesn't throw any erros.

If the current behavior is a bug, please provide the steps to reproduce

If you take the current crawler and try to run from a remote server on Digital Ocean for sites that implement blocking of bots the returned 403 error does not trigger the error promise. This can be replicated with any best buy URL as an example.

What is the expected behavior?

Sites that return 403 unauthorized errors should trigger the onError function and move on to the next URL to be crawled.

What is the motivation / use case for changing the behavior?

If a site implements this type of blocking it would halt your entire crawl process without triggering any kind of notification that this URL failed.

Assignee
Assign to
Time tracking