Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • H headless-chrome-crawler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 29
    • Issues 29
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • yujiosaka
  • headless-chrome-crawler
  • Issues
  • #192
Closed
Open
Issue created Apr 02, 2018 by Anthony Pessy@panthony

You should be able to provides the robots.txt

What is the current behavior?

Today the project automatically resolves the robots.txt.

What is the expected behavior?

It would be useful to be able to provides the robots.txt instead to bypass the default behavior of resolving it automatically.

What is the motivation / use case for changing the behavior?

  • You may want to provides a different set of rules (let's say I'm the owner of the site and I want to check of the crawler would behave with a different robot.txt)

  • In a big distributed environment, maybe you want to resolve the robots.txt once and share it with all the workers

Assignee
Assign to
Time tracking