Infinite scroll

eldoland · September 18, 2018, 2:41pm

Hi,
im trying to scrape this page

https://www.fintastico.com/fintech-uk/

i manually scroll down until i see all results
then i try to run this sitemap
but i get only the "first page" results
how can i tell the scraper to automatically scroll down?
i also tried "element scroll down" but not sure how to set it up
thanks!

Sitemap:
{"_id":"fintastico","startUrl":["https://www.fintastico.com/fintech-uk/"],"selectors":[{"id":"cards","type":"SelectorElement","parentSelectors":["_root"],"selector":"ul.archive-list li","multiple":true,"delay":0},{"id":"companylink","type":"SelectorLink","parentSelectors":["cards"],"selector":"h4","multiple":true,"delay":0},{"id":"elements","type":"SelectorElement","parentSelectors":["companylink"],"selector":"div.col-md-8","multiple":true,"delay":0},{"id":"website","type":"SelectorText","parentSelectors":["elements"],"selector":"li:nth-of-type(1) a","multiple":false,"regex":"","delay":0},{"id":"linkedin","type":"SelectorText","parentSelectors":["elements"],"selector":"a.in","multiple":false,"regex":"","delay":0}]}

NetworkReject · September 18, 2018, 11:45pm

Simple, just change your "cards" element selector to "Selector Element Scroll" and add a delay of 2000-3000 milliseconds to it.

{"_id":"fintastico","startUrl":["https://www.fintastico.com/fintech-uk/"],"selectors":[{"id":"cards","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"ul.archive-list li","multiple":true,"delay":"3000"},{"id":"companylink","type":"SelectorLink","parentSelectors":["cards"],"selector":"h4","multiple":true,"delay":0},{"id":"elements","type":"SelectorElement","parentSelectors":["companylink"],"selector":"div.col-md-8","multiple":true,"delay":0},{"id":"website","type":"SelectorText","parentSelectors":["elements"],"selector":"li:nth-of-type(1) a","multiple":false,"regex":"","delay":0},{"id":"linkedin","type":"SelectorText","parentSelectors":["elements"],"selector":"a.in","multiple":false,"regex":"","delay":0}]}

eldoland · September 19, 2018, 12:12pm

thanks for this solution!
it seems it is scraping now, but i get zero results, also i can see it can't grab the company name

how can i fix this?
thanks!

eldoland · September 19, 2018, 1:05pm

i just tried this sitemap
{"_id":"fintastico3","startUrl":["https://www.fintastico.com/fintech-uk/"],"selectors":[{"id":"cards","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"ul.archive-list li","multiple":true,"delay":"5000"},{"id":"cardblock","type":"SelectorElement","parentSelectors":["cards"],"selector":"div.card-block","multiple":true,"delay":0},{"id":"companylink","type":"SelectorLink","parentSelectors":["cardblock"],"selector":"h4","multiple":true,"delay":0},{"id":"website","type":"SelectorText","parentSelectors":["companylink"],"selector":"li:nth-of-type(1) a","multiple":false,"regex":"","delay":0},{"id":"linkedin","type":"SelectorText","parentSelectors":["companylink"],"selector":"a.in","multiple":false,"regex":"","delay":0}]}

but still zero data

NetworkReject · September 19, 2018, 3:27pm

Alright, so I had only fixed the scroll part before and didn't even check to see as to if it worked past that. I did notice something else wrong, you were using a text element to try and extract your linkedin and website urls where you should have been using ElementAttribute with href as the attribute, but that's not what's keeping you from getting any results. I tried turning off the scroll part and it works just fine so I think the issue might be that there isn't an end to the scrolling. I don't think it will start following the 'cards' links until it gets to the bottom of the page and if there isn't one then it will just keep scrolling forever.

NetworkReject · September 19, 2018, 3:36pm

I forgot to add the current version I have for your scraper but again if it is truly an infinite scroll then it just might now be possible.

{"_id":"fintastico","startUrl":["https://www.fintastico.com/fintech-uk/"],"selectors":[{"id":"cards","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"ul.archive-list li","multiple":true,"delay":"3000"},{"id":"companylink","type":"SelectorLink","parentSelectors":["cards"],"selector":"a","multiple":false,"delay":0},{"id":"elements","type":"SelectorElement","parentSelectors":["companylink"],"selector":"div.col-md-8","multiple":true,"delay":0},{"id":"website","type":"SelectorElementAttribute","parentSelectors":["elements"],"selector":"li:nth-of-type(1) a","multiple":false,"extractAttribute":"href","delay":0},{"id":"linkedin","type":"SelectorElementAttribute","parentSelectors":["elements"],"selector":"a.in","multiple":false,"extractAttribute":"href","delay":0},{"id":"companyname","type":"SelectorText","parentSelectors":["cards"],"selector":"h4","multiple":false,"regex":"","delay":0}]}

NetworkReject · September 19, 2018, 3:46pm

It's working....running it right now and it's currently scraping the individual card pages.

eldoland · September 19, 2018, 9:11pm

cool thanks it is working great!

RazZziel · January 31, 2020, 11:34pm

Is that the latest version of the working scrapper? It's not working for me now, I imported it verbatim, and the scrapper will scroll up and down the website for a while but after a short wait it will finish with zero results.

Arthur_Letemple · April 6, 2020, 12:07pm

Hi,

I have the same problem as @eldoland had.

It is scrolling down, but I have no data, I can't go further..

How can I fix this ?
Thanks

{"_id":"hackerone","startUrl":["https://hackerone.com/directory/programs"],"selectors":[{"id":"liste_programme","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"a.daisy-link--major","multiple":true,"delay":"3000"},{"id":"source_code","type":"Text","parentSelectors":["liste_programme"],"selector":".vertical-spacing div.grid--has-outside-gutter:nth-of-type(1) div.grid__column:nth-of-type(1)","multiple":true,"regex":"(?=[Ss]ource code analysis|[Ss]ource code review|[Ss]ource code|[Cc]ode analysis|[Cc]ode review)","delay":0},{"id":"inscope","type":"Text","parentSelectors":["liste_programme"],"selector":".card__content .vertical-spacing .daisy-table__cell > span","multiple":true,"delay":0},