Extracting first n elements on page with infinite data

Describe the problem: I'm trying to scrape a page with infinite data. I'm using the element scroll down selector, but since it would loop forever if I didn't stop it, I'm trying to just get the first 30 elements. I used the :nth-of-type(-n+30) selector for the element scroll down, but it still seems to scroll infinitely.

Note: This specific example requires a Facebook account, but if you can figure out a general solution to this problem, then that would be helpful to future readers of this question.
Sitemap:
{"_id":"fbevents","startUrl":["https://www.facebook.com/events/discovery/?acontext={"ref"%3A"2"%2C"ref_dashboard_filter"%3A"upcoming"%2C"action_history"%3A"[{\"surface\"%3A\"dashboard\"%2C\"mechanism\"%3A\"main_list\"%2C\"extra_data\"%3A{\"dashboard_filter\"%3A\"upcoming\"}}]"}"],"selectors":[{"id":"title","type":"SelectorText","parentSelectors":["scr"],"selector":"a._7ty","multiple":false,"regex":"","delay":0},{"id":"scr","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"a._7ty:nth-of-type(-n+30) ","multiple":true,"delay":"1000"}]}

URL: https://www.facebook.com/events/discovery/?acontext={"ref"%3A"2"%2C"ref_dashboard_filter"%3A"upcoming"%2C"action_history"%3A"[{\"surface\"%3A\"dashboard\"%2C\"mechanism\"%3A\"main_list\"%2C\"extra_data\"%3A{\"dashboard_filter\"%3A\"upcoming\"}}]"}

This was a tricky one and initially I couldn't find any selector on the FB page that could stop the scroller. I finally used the "Interested" button in the event boxes, li:nth-of-type(11) button

I stopped the scroller at box 11 and that got me 30 events on my computer. You can tweak the value in (11) to get the number of events you want. Results will vary due to browser window size and screen resolution.

{"_id":"fb_event","startUrl":["https://www.facebook.com/events/discovery/?acontext={"ref"%3A"2"%2C"ref_dashboard_filter"%3A"upcoming"%2C"action_history"%3A"[{\"surface\"%3A\"dashboard\"%2C\"mechanism\"%3A\"main_list\"%2C\"extra_data\"%3A{\"dashboard_filter\"%3A\"upcoming\"}}]"}"],"selectors":[{"id":"scroll_till_11","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"li:nth-of-type(11) button","multiple":false,"delay":"600"},{"id":"date","type":"SelectorText","parentSelectors":["event_box"],"selector":"div._3j4p","multiple":false,"regex":"","delay":0},{"id":"event","type":"SelectorText","parentSelectors":["event_box"],"selector":"a._7ty","multiple":false,"regex":"","delay":0},{"id":"location","type":"SelectorText","parentSelectors":["event_box"],"selector":"._42ef div > span","multiple":false,"regex":"","delay":0},{"id":"event_box","type":"SelectorElement","parentSelectors":["_root"],"selector":"div._3j4o","multiple":true,"delay":0},{"id":"URL","type":"SelectorLink","parentSelectors":["event_box"],"selector":"a._7ty","multiple":false,"delay":0}]}

2 Likes

Wow this works, you're a genius, this worked great! Although I'm not sure why it works haha...