Need help with a simple scroll down problem

Hey there, I'm trying to scrap names of companies + their linkedin URL, there are about 2331 of them, can you help me fix the scroll down selector so it can scroll down and scrap everything from top to bottom ? thank you so much in advance.

Link:

https://angel.co/companies?company_types[]=Startup&company_types[]=Private+Company&company_types[]=SaaS&company_types[]=Mobile+App&locations[]=1653-Los+Angeles

my sitemap:

{"_id":"angel","startUrl":["https://angel.co/companies?company_types[]=Startup&company_types[]=Private+Company&company_types[]=SaaS&company_types[]=Mobile+App&locations[]=1653-Los+Angeles"],"selectors":[{"id":"Parent","type":"SelectorLink","parentSelectors":["_root"],"selector":"div.name a.startup-link","multiple":true,"delay":0},{"id":"Company linkedin URL","type":"SelectorLink","parentSelectors":["Parent"],"selector":"a.fontello-linkedin","multiple":false,"delay":0},{"id":"tags","type":"SelectorText","parentSelectors":["Parent"],"selector":"div.js-market_tag_holder","multiple":false,"regex":"","delay":0},{"id":"Scroll down selector","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.more","multiple":true,"delay":"3000"}]}

You need to use the Element Click selector.

I believe this should work but you need to set delays because they will send you a Captcha very quickly. Also loading this many records will likely crash your browser (or lag it)

{"_id":"angel-list-company-scrape","startUrl":["https://angel.co/companies?company_types[]=Startup&company_types[]=Private+Company&company_types[]=SaaS&company_types[]=Mobile+App&locations[]=1653-Los+Angeles"],"selectors":[{"id":"Load More","type":"SelectorElementClick","parentSelectors":["_root"],"selector":".startup","multiple":true,"delay":"1000","clickElementSelector":"div.more","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Link","type":"SelectorLink","parentSelectors":["Load More"],"selector":"div.name","multiple":false,"delay":"1000"},{"id":"Name","type":"SelectorText","parentSelectors":["Link"],"selector":"h1.u-fontWeight500","multiple":false,"regex":"","delay":0},{"id":"href","type":"SelectorElementAttribute","parentSelectors":["Link"],"selector":"a.fontello-linkedin","multiple":false,"extractAttribute":"href","delay":0}]}

It went all the way down but it didin't scrap anything, so instead I used this

tr.box:nth-last-of-type(n)
or
tr.box:nth-last-child(n)

But your sitemap helped me solve a couple of things so thanks man!!

I also ran into a problem where it would only load 20 pages deep and then hang. Did you overcome that?

nope, I gave up on that site since no matter what delay I set I still get blocked lol

Don't give up, you still can scrape just links to companies and use them later to create a sitemap that will scrape them directly. I'd give it a try.

1 Like

It seems to limit the element click (next) to 20 pages before it stalls out and webscraper hangs and I just read on Phantom Buster that 400 companies is their limit.

https://phantombuster.com/api-store/3678/angellist-market-companies-extractor

1 Like

I will man thankss!!

aha I see, so even phantom buster can't go beyond that limit, interesting

Well, if you manually scroll all the way down

Which i guess is the limit of 400 companies.

1 Like