Cannot scrape 404 page

Trying to scrape information from a product page to know if a particular stocked item is no longer in stock. Easily done with a page if it says out of stock, or similar...

Trouble happens when the product URL is no longer active and throws a 404 error page. I thought it would have been easy - just scrape the text on the 404 page.... but you cant.

You cant scrape the page of any information if the page is a 404 page. No txt, no image, no HTML, no elements, no nothing. webscraper just closes.

Data preview works perfectly fine when you are in the 404 page making the selectors but once you scrape it, it closes without collecting the data. I tested this on multiple sites and multiple URLs. It happens any 404 page, whether its a redirect to a 404 error page or just a standard 404 on the same URL

test any URL that throws a 404 error page to see.

Would love some feedback on this
I am scraping data from in particular

Heres a quick demo sitemap:

Really sad this didn't get any attention, not even one reply, using it for the same purpose but yet the web scrapper seems to be skipping 404 pages even though there's the title and some keywords to scrape.

Yea, it was a little disheartening not getting a response for the help i needed back then. I never did find a way to fix the 404 issue.
By memory, I think I found an alternative way of scraping the data I wanted.
I had been scraping eBay and I found that they had different URL's to get to the same product page. One of the URL's would go to 404 if it was unlisted or a deleted item (causing the issue which made me start this post) but I found another URL to the same product item that actually showed the product page as unlisted or removed etc... So this worked for me.

It never fixed the 404 issue, but in my case for eBay I found a way around it.

1 Like

Managed to find a forked version of this tool which shows 404 page errors called Web Scraper Plus although the problem is that the forked tool doesn't show the start url so I don't know what url is giving the 404 error.