Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • H headless-chrome-crawler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 29
    • Issues 29
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • yujiosaka
  • headless-chrome-crawler
  • Issues
  • #148
Closed
Open
Issue created Mar 08, 2018 by Administrator@rootContributor

[Feature Request] Add support for multiple sitemaps

Created by: NickStees

What is the current behavior? I don't believe the crawler is handling sitemaps broken out into multiple sitemaps. This is common in large sites since sitemaps are limited to 50k urls. See Simplify multiple sitemap management

Good example is NASA https://www.nasa.gov/sitemap.xml

What is the expected behavior? Successfully crawl large sites via sitemap(s)

What is the motivation / use case for changing the behavior? Large enterprise sites not being crawled via sitemap

Assignee
Assign to
Time tracking