Twitter auto loading

eldoland · June 15, 2018, 1:11pm

Hi,
im trying to scrape this page
https://twitter.com/designerstalk/followers
but it is scraping only 55 boxes
how can i get all of them?
thanks!

Sitemap:
{"_id":"twitter","startUrl":["https://twitter.com/designerstalk/followers"],"selectors":[{"id":"loader","type":"SelectorElementScroll","selector":".GridTimeline","parentSelectors":["_root"],"multiple":true,"delay":"2000"},{"id":"box","type":"SelectorElement","selector":".js-stream-item","parentSelectors":["loader"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"div.ProfileCard-userFields a.fullname","parentSelectors":["box"],"multiple":false,"regex":"","delay":0},{"id":"twitterhandle","type":"SelectorText","selector":"a.ProfileCard-screennameLink span.username","parentSelectors":["box"],"multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorText","selector":"p.ProfileCard-bio","parentSelectors":["box"],"multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorLink","selector":"div.ProfileCard-userFields a.fullname","parentSelectors":["box"],"multiple":false,"delay":0}]}

eldoland · June 18, 2018, 1:55pm

any idea?
what should i do to make it work?
thanks

iconoclast · June 18, 2018, 7:11pm

Hello again!

Unfortunately, I didn't succeed in this scrape. I've got Scroll Down Selector to work infinitely, but when stopped manually no results were given. Only thing i can guess of is a selector limitation.

There is a workaround, you can scroll down manually, as long as you want, and then just hit data preview, then copy-paste the results. This is the easiest path possible.

Here's the sitemap i've changed a little:

{"_id":"twitter","startUrl":["https://twitter.com/designerstalk/followers"],"selectors":[{"id":"loader","type":"SelectorElement","selector":"div.GridTimeline-items div.Grid div.ProfileCard","parentSelectors":["_root"],"multiple":true,"delay":"0"},{"id":"name","type":"SelectorText","selector":"div.ProfileCard-userFields a.fullname","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"twitterhandle","type":"SelectorText","selector":"a.ProfileCard-screennameLink span.username","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorText","selector":"p.ProfileCard-bio","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorLink","selector":"a.ProfileCard-screennameLink","parentSelectors":["loader"],"multiple":false,"delay":0},{"id":"load_more","type":"SelectorElementScroll","selector":"parent","parentSelectors":["load_more"],"multiple":true,"delay":"0"}]}

KristapsWS · June 19, 2018, 11:27am

You can add a :nth-of-type(-n+x) css selector to your element scroll down selector to limit the scrolled record count. Change the x to different number to limit the count of scraped items, it will change by 6.

For example:

6 scraped elements - div.GridTimeline-items div.Grid:nth-of-type(-n+1) div.ProfileCard

12 scraped elements - div.GridTimeline-items div.Grid:nth-of-type(-n+2) div.ProfileCard

18 scraped elements - div.GridTimeline-items div.Grid:nth-of-type(-n+3) div.ProfileCard

and so on.

eldoland · June 19, 2018, 1:04pm

thanks a lot guys!
i tried this

{"_id":"twitter5","startUrl":["https://twitter.com/sagmeisterwalsh/followers"],"selectors":[{"id":"loader","type":"SelectorElement","selector":"div.GridTimeline-items div.Grid:nth-of-type(-n+20000) div.ProfileCard","parentSelectors":["_root"],"multiple":true,"delay":"0"},{"id":"name","type":"SelectorText","selector":"div.ProfileCard-userFields a.fullname","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"twitterhandle","type":"SelectorText","selector":"a.ProfileCard-screennameLink span.username","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorText","selector":"p.ProfileCard-bio","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorLink","selector":"a.ProfileCard-screennameLink","parentSelectors":["loader"],"multiple":false,"delay":0},{"id":"load_more","type":"SelectorElementScroll","selector":"parent","parentSelectors":["load_more"],"multiple":true,"delay":"0"}]}

but still not working, i just get 18 results
is this correct? div.Grid:nth-of-type(-n+20000) div.ProfileCard

thanks again!

KristapsWS · June 19, 2018, 1:16pm

Set a delay to at least 2000ms for element scroll down selector.

eldoland · June 19, 2018, 1:44pm

thanks
but i get 18 results now

{"_id":"twitter5","startUrl":["https://twitter.com/sagmeisterwalsh/followers"],"selectors":[{"id":"loader","type":"SelectorElement","selector":"div.GridTimeline-items div.Grid:nth-of-type(-n+20000) div.ProfileCard","parentSelectors":["_root"],"multiple":true,"delay":"2000"},{"id":"name","type":"SelectorText","selector":"div.ProfileCard-userFields a.fullname","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"twitterhandle","type":"SelectorText","selector":"a.ProfileCard-screennameLink span.username","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorText","selector":"p.ProfileCard-bio","parentSelectors":["loader"],"multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorLink","selector":"a.ProfileCard-screennameLink","parentSelectors":["loader"],"multiple":false,"delay":0},{"id":"load_more","type":"SelectorElementScroll","selector":"parent","parentSelectors":["load_more"],"multiple":true,"delay":"0"}]}

KristapsWS · June 19, 2018, 1:59pm

Unfortunately it won't be possible to scrape twitter with current Web Scraper selectors because they have changed the method on how the scrolling works.

eldoland · June 19, 2018, 2:20pm

oh ok!
thanks for that!
i will use the other method mentioned by iconoclast
thanks again!

iconoclast · June 19, 2018, 6:18pm

The scrape I didn't succeed at first try (infinite scroll) will work and show results, but only if you wait until it finishes scraping without manually stopping it as I did. The time needed can be calculated based on followers number.

EDIT: just looked, 10k+ followers will surely take a long time to scrape even with removed scroll down timer. I also think that Chrome might crash during that big scrape too.

dblanco · August 29, 2018, 2:38am

Some code:

{"startUrl":"https://twitter.com/@Microsoft/followers","selectors":[{"parentSelectors":["contactos"],"type":"SelectorText","multiple":false,"id":"nombre","selector":"a.ProfileNameTruncated-link","delay":"","regex":""},{"parentSelectors":["contactos"],"type":"SelectorText","multiple":false,"id":"cuenta","selector":"a.ProfileCard-screennameLink span.username b","delay":"","regex":""},{"parentSelectors":["contactos"],"type":"SelectorText","multiple":false,"id":"descripcion","selector":"p.ProfileCard-bio","delay":"","regex":""},{"parentSelectors":["_root"],"type":"SelectorElementScroll","multiple":true,"id":"contactos","selector":"div.ProfileCard","delay":"2500"},{"parentSelectors":["contactos"],"type":"SelectorElementAttribute","multiple":false,"id":"idtwitter","selector":"div.user-actions.btn-group.not-following.not-muting","extractAttribute":"data-user-id","delay":""}],"_id":"seguidores_tw"}

This example worked until a few months ago. Now, it shows data in preview, but when scrapping it gives: "No data scraped yet".

Using
Request interval (ms): 2000
Page load delay (ms): 2000

Any idea what the problem may be? Thanks

Siriwat_Limwattana · August 9, 2019, 7:38am

Hi,

I did not understand the idea.
As I read the json configuration load_more wasn't executed, was it?
Is there any explanation on the magic of -n+x why it is changed by 6