How to keep entry if link fails?

jcljules · November 5, 2019, 2:51am

I'm scraping a national organization's directory of local satellite orgs for contact emails. First my scraper grabs the name and telephone number from the directory, then it access the link to the satellite org's site and looks for an email. Problem is, if the link fails (some of the local orgs have defunct sites) the entry is deleted altogether. Is there any way to make sure the parent element isn't deleted just because a child link fails?

leemeng · March 31, 2020, 5:44am

Hard to diagnose without a sitemap or Url, but I'm guessing your sitemap is not structured properly. You probably have the link navigator at the _root level so it will fail if there is no data to scrape. Ideally it should be located within a wrapper element along with name and phone number, so when it fails it will return "Null" and the other data is retained.