Scraping slows to a crawl after a while

Chrome Version 70.0.3538.77 (Official Build) (64-bit)
Webscraper Version 0.3.8

{"_id":"accfinal","startUrl":["https://auscompcomputers.com/products"],"selectors":[{"id":"categories1","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"#compact_view > .listshow","multiple":true,"delay":"0","clickElementSelector":"ul.mtree.leftmenus li.mtree-node a.prod-cat-link","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"products","type":"SelectorLink","parentSelectors":["categories1"],"selector":"a.prod-link","multiple":true,"delay":""},{"id":"title","type":"SelectorText","parentSelectors":["product page"],"selector":"div.col-xs-8 > div.col-xs-12 p.tabbgtptxt3","multiple":false,"regex":"","delay":0},{"id":"brisbane qty","type":"SelectorText","parentSelectors":["product page"],"selector":"div.table-responsive tr:nth-of-type(2) td.padnone:nth-of-type(2)","multiple":false,"regex":"[^><]","delay":0},{"id":"sydney qty","type":"SelectorText","parentSelectors":["product page"],"selector":"tr:nth-of-type(2) td.padnone:nth-of-type(3)","multiple":false,"regex":"[^><]","delay":0},{"id":"price ex","type":"SelectorText","parentSelectors":["product page"],"selector":"td.txtcnta:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"vendor qty","type":"SelectorText","parentSelectors":["product page"],"selector":"tr:nth-of-type(2) td.padnone:nth-of-type(4)","multiple":false,"regex":"[^><]","delay":0},{"id":"mpn","type":"SelectorText","parentSelectors":["product page"],"selector":"td.padnone tr:contains('Part No.') td.martrtd:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"weight","type":"SelectorText","parentSelectors":["product page"],"selector":"td.padnone tr:contains('Gross Weight(Kg)') td.martrtd:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"brand","type":"SelectorText","parentSelectors":["product page"],"selector":"td.padnone tr:contains('View all products from') td.martrtd:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"WTY","type":"SelectorText","parentSelectors":["product page"],"selector":"td.padnone tr:contains('Waranty') td.martrtd:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"img src","type":"SelectorElementAttribute","parentSelectors":["product page"],"selector":"div.image-preview img","multiple":false,"extractAttribute":"src","delay":0},{"id":"long desc","type":"SelectorHTML","parentSelectors":["product page"],"selector":"div.tab-pane.active div.col-xs-12","multiple":false,"regex":"","delay":0},{"id":"short desc","type":"SelectorElementClick","parentSelectors":["product page"],"selector":"div.tab-pane.active div.col-xs-12","multiple":false,"delay":"2000","clickElementSelector":"div.tab-v1 li:nth-of-type(2) a","clickType":"clickOnce","discardInitialElements":true,"clickElementUniquenessType":"uniqueText"},{"id":"category 1","type":"SelectorText","parentSelectors":["product page"],"selector":"span.bread_category a.gotomaincategory","multiple":false,"regex":"(?<=>\\s).*","delay":0},{"id":"category 2","type":"SelectorText","parentSelectors":["product page"],"selector":"span.bread_subcategory a.gotomaincategory","multiple":false,"regex":"","delay":0},{"id":"short desc content","type":"SelectorText","parentSelectors":["short desc"],"selector":"div.col-xs-12","multiple":false,"regex":"","delay":0},{"id":"product page","type":"SelectorElement","parentSelectors":["products"],"selector":"div.col-xs-12.rtcol","multiple":false,"delay":0}]}

Scraping begins quite fast but then slows to a crawl and takes a good part of a day, sometimes more, for a medium sized website. There's no reason for the scraping to slow down this much. It should scrape at a consistent rate throughout the job, yet it always starts fast and after maybe an hour or more, it slows to a crawl.

NB: the pricing is password protected, so you will not see the element or text for pricing but all the other data is there.

Hi there!

Have you tried using CouchDB to store scraped data while you scrape?

@iconoclast Not yet - I have no experience or knowledge thereof. Could you please explain the reasoning behind your question? You have reason to believe CouchDB would solve this problem? Or does doing so have diagnostic value?

Also, I'd like to ask if you or anyone believes this gradual slowing down and grinding to a virtual halt is can be the result of webscraping running into anti scraping defences on the web? Has the developer ever given pause to that?

I had the same problem with a website yesterday. I turned off the use of cookies in the browser and it helped.

1 Like

How odd. I want an official explanation from the developers for this behaviour.

did that help? I thought the scaper might be generating a new cookie for each view, that would cause a lot a cookies to deal with and could slow down the process.

1 Like