How do I crawl around a Google Redirect Notice

Describe the problem.
on e-commerce sites, with the sites being so big, I've exported the xml into a sheet and published broken down lists of around 2k URLs. this has got around the problem of an individual crawl taking a whole day and avoided pagination errors, but after around 1k the crawler window ends up only displaying a Google Redirect Notice and the crawler stops pulling data. how can I amend my set up to work around this?

Url: heatandplumb sitemap brakedown - Google Drive

Sitemap:
{"_id":"hnp-sm-xml","startUrl":["https://docs.google.com/spreadsheets/d/e/2PACX-1vSht537EdRJh5L_cVK8dHQMkSVAmDzeeU5LREqJmbRTIrjkKX7kJtJlz8URYsQCHAYEElED1Za8xrNP/pubhtml?gid=944957861&single=true"],"selectors":[{"id":"product","parentSelectors":["_root"],"type":"SelectorLink","selector":".softmerge-inner a","multiple":true,"delay":0},{"id":"name","parentSelectors":["product"],"type":"SelectorText","selector":"h1","multiple":false,"delay":0,"regex":""},{"id":"hnp-ref","parentSelectors":["product"],"type":"SelectorText","selector":".col-lg-12.d-flex div:nth-of-type(2) span.product-reference","multiple":false,"delay":0,"regex":""},{"id":"brand-listing","parentSelectors":["product"],"type":"SelectorText","selector":"tr:contains('Brand') td","multiple":false,"delay":0,"regex":""},{"id":"range-listing","parentSelectors":["product"],"type":"SelectorText","selector":"tr:contains('Model/Collection') td","multiple":false,"delay":0,"regex":""},{"id":"mpn1","parentSelectors":["product"],"type":"SelectorText","selector":"tr:contains('Part Number') td","multiple":false,"delay":0,"regex":""},{"id":"mpn2","parentSelectors":["product"],"type":"SelectorText","selector":"tr:contains('MPN') td","multiple":false,"delay":0,"regex":""},{"id":"ean1","parentSelectors":["product"],"type":"SelectorText","selector":"tr:contains('Ean') td","multiple":false,"delay":0,"regex":""},{"id":"ean2","parentSelectors":["product"],"type":"SelectorText","selector":"tr:contains('EAN') td","multiple":false,"delay":0,"regex":""}]}

@Dismas Hi, have you tried using an alternative source that would be used for storing these URLs?

Example: https://pastelink.net/

1 Like

not yet. thanks for the link! I'll check it out and check back in.

OK, just gave it a quick try and I must have set up the link wrong somehow, when it starts running, it opens the link in a tab in the browser window I'm currently using. I have a sneaking suspicion that Chrome will give give me a memory error before I hit the end of my 2k list.

seems to all be working now. Still don't know if I'm gonna get Redirects yet. appreciate the input.

All working great now!