Werbscraper 403 error

januaryrain77 · October 29, 2020, 2:04pm

Describe the problem.

Hi all

I have been using webscraper for a year. now it doesn't work and everytime I scrape I got a 403 error. It appear that the site that I am scraping on now is using a newer detection technology. How can I bypass this detection and be able to scrape again

leemeng · October 30, 2020, 11:01pm

403 usually means a server/website config problem. So you'd have to contact their admin or support to fix it. Ref https://www.hostinger.my/tutorials/what-is-403-forbidden-error-and-how-to-fix-it

januaryrain77 · November 1, 2020, 1:41pm

Hi leemeng,

Thank you for your prompt response.the website that i try to scrape is the largest online mall in indonesia.before this month,i was using webscrapper and it is working perfectly. But this month ,they using a newer detection technology that ban bot scrapper especially webscraper

I have no way to contact them,as you know they dont like their site to be bombarded with scraper.

I have used Cors plug in, set the scrapping time to 10,000. Still the 403 error show up,do i need to change the user agent on the header to mozilla maybe. How can i do that?

I have taken a look at the link you sent me regarding .htc access file etc. Bu thats when we are the webmaster. In my case, I am scrapping an online mall for their prices and description so i am unable to touch their setting file.

Is there a way around this new detection technology that block scrapper by using 403 error?

Thank you so much for your help

januaryrain77 · November 2, 2020, 1:30pm

Hi Leemeng

I try to reproduce the error

The start Url is
https://www.tokopedia.com/simpatifurniture/page/[1-57]?source=universe&st=product

And the imported sitemap script that I use to scrape

{"_id":"simpati","startUrl":["https://www.tokopedia.com/simpatifurniture/page/[1-57]?source=universe&st=product"],"selectors":[{"id":"productlink","type":"SelectorLink","parentSelectors":["_root"],"selector":".css-1ehqh5q a","multiple":true,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["productlink"],"selector":"h1.css-x7lc0h","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["productlink"],"selector":"h3.css-c820vl","multiple":false,"regex":"","delay":0},{"id":"scroll","type":"SelectorElementScroll","parentSelectors":["productlink"],"selector":"p.css-olztn6-unf-heading","multiple":true,"delay":"250"},{"id":"productdescription","type":"SelectorText","parentSelectors":["productlink"],"selector":"p.css-olztn6-unf-heading","multiple":false,"regex":"","delay":0},{"id":"judul_desc","type":"SelectorText","parentSelectors":["productlink"],"selector":".css-d4v4mp-unf-heading span, h2.css-d4v4mp-unf-heading","multiple":false,"regex":"","delay":0},{"id":"terjual","type":"SelectorText","parentSelectors":["productlink"],"selector":".\[object span","multiple":false,"regex":"","delay":0},{"id":"dilihat","type":"SelectorText","parentSelectors":["productlink"],"selector":"b.\[object","multiple":false,"regex":"","delay":0},{"id":"cate","type":"SelectorText","parentSelectors":["productlink"],"selector":"li:nth-of-type(4) a.css-yoyor-unf-heading","multiple":false,"regex":"","delay":0},{"id":"berat","type":"SelectorText","parentSelectors":["productlink"],"selector":"dt:contains('Berat') + dd p","multiple":false,"regex":"","delay":0},{"id":"seller","type":"SelectorText","parentSelectors":["productlink"],"selector":".css-y8x67s a","multiple":false,"regex":"","delay":0}]}

after I set, Request interval (ms)=10000 and Request interval (ms)=5000

I get the following error in console screen

Would you or any of the Webscraper development team kindly help me to avoid this 403 error?

Is there a chrome plugin that I can use to bypass this error? Or there is chrome setting that I can change ? Or maybe use mozilla to change the header user agent

Because if webscraper are unable to bypass this 403 error , lots of user paid or free will be unable to use webscraper also.

Thank you so much for your help
Jan