Scraper just crashes

Jacob · November 3, 2018, 5:46pm

I am trying to scrape the names off the event list. I've tried various combinations of link selector, link popup, element selector etc.

When I walk through the selectors using element and data preview, everything functions correctly. But when I run it, it loads the popup window, then crashes without scraping anything.

Url: https://www.facebook.com/events/2199586793636198/

Sitemap:
{"_id":"wtf","startUrl":["https://www.facebook.com/events/2199586793636198/"],"selectors":[{"id":"attendance","type":"SelectorPopupLink","parentSelectors":["_root"],"selector":"a._5z74","multiple":false,"delay":"1500"},{"id":"pages","type":"SelectorPopupLink","parentSelectors":["attendance"],"selector":"a._1y4a","multiple":true,"delay":"5000"},{"id":"name","type":"SelectorText","parentSelectors":["pages"],"selector":"span._h24","multiple":true,"regex":"","delay":0}]}

Jasmin_Watts · May 10, 2021, 9:13pm

Hi there I'm having the exact same problem. I have never been able to use the web scraper succesfuly. For a project I am trying to scrape https://www.forbes.com/billionaires/

Data preview is perfect. I click scrape and the window pops up for about 3 seconds and then crashes. the only data I have is whatever was scraped in those seconds. Did you find a fix for this problem??

ViestursWS · May 11, 2021, 4:35am

Hello @Jasmin_Watts

Element-click should come very handy here.
Example:

{"_id":"forbes-com","startUrl":["https://www.forbes.com/billionaires/"],"selectors":[{"id":"wrapper","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.table-row:nth-of-type(n+2)","multiple":true,"delay":2000,"clickElementSelector":"button[aria-label='go to page 1'], button[aria-label='go to page 2']","clickType":"clickOnce","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueCSSSelector"},{"id":"rank","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.personName","multiple":false,"regex":"","delay":0},{"id":"net-worth","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.netWorth","multiple":false,"regex":"","delay":0},{"id":"age","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.age","multiple":false,"regex":"","delay":0},{"id":"country","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.countryOfCitizenship","multiple":false,"regex":"","delay":0},{"id":"source","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.source","multiple":false,"regex":"","delay":0},{"id":"industry","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.category","multiple":false,"regex":"","delay":0}]}

Jasmin_Watts · May 11, 2021, 10:50am

thanks so much for getting back to me! Can you explain how to do element click for this page? I can see that in your example it goes root > wrapper > name etc. Can you explain how to do this?

also, I ran your example through the web scraper and only got 401 rows but none were the richest people.. is there a way to just get the top 200 billionaires - the first page?

ViestursWS · May 11, 2021, 10:53am

Oh, then just use element selector targeting each of the table rows like this.

{"_id":"forbes-com","startUrl":["https://www.forbes.com/billionaires/"],"selectors":[{"id":"wrapper","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.table-row:nth-of-type(n+2)","multiple":true,"delay":0},{"id":"rank","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.personName","multiple":false,"regex":"","delay":0},{"id":"net-worth","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.netWorth","multiple":false,"regex":"","delay":0},{"id":"age","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.age","multiple":false,"regex":"","delay":0},{"id":"country","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.countryOfCitizenship","multiple":false,"regex":"","delay":0},{"id":"source","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.source","multiple":false,"regex":"","delay":0},{"id":"industry","type":"SelectorText","parentSelectors":["wrapper"],"selector":"div.category","multiple":false,"regex":"","delay":0}]}

ViestursWS · May 11, 2021, 10:55am

There probably were cause it was configured to click through 2 pages and data is returned in pseudo-random order you can sort it out manually by the webscraper order column.

Jasmin_Watts · May 11, 2021, 11:33am

Thanks for all your help!