duplicated url are crawled twice
What is the current behavior?
Duplicated urls are not skipped. The same url is crawled twice.
If the current behavior is a bug, please provide the steps to reproduce
const HCCrawler = require('./lib/hccrawler');
(async () => {
const crawler = await HCCrawler.launch({
evaluatePage: () => ({
title: document.title,
}),
onSuccess: (result => {
/console.log(result);
}),
skipDuplicates: true,
jQuery: false,
maxDepth: 3,
args: ['--no-sandbox']
});
await crawler.queue([{
url: 'https://www.example.com/'
}, {
url: 'https://www.example.com/'
}]);
await crawler.onIdle();
await crawler.close();
})();
What is the expected behavior?
Crawled urls should be skipped even if they come from the queue
.
Please tell us about your environment:
- Version: lastest
- Platform / OS version: Centos 7.1
- Node.js version: v8.4.0