LinkedIn Group Infinite Scroll

Hi!

Here's what I'm trying to accomplish with the sitemap:
Start in LinkedIn group (root) --> Scrape data (name & title text selectors) --> Scroll to next section --> Scrape data --> Scroll to next section, and so on.

Right now, it will just sit there idle for 2000ms, scroll to next section, scroll back to the top, scroll down again, then scrape. The scraping results are all mixed up.

Url: https://www.linkedin.com/groups/60962/members/

Sitemap:
{"_id":"linkedin-bevhillsexec","startUrl":["https://www.linkedin.com/groups/60962/members/"],"selectors":[{"id":"scroll","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.neptune-grid","multiple":false,"delay":"2000"},{"id":"name","type":"SelectorText","parentSelectors":["scroll"],"selector":"artdeco-entity-lockup-title","multiple":true,"regex":"","delay":0},{"id":"title","type":"SelectorText","parentSelectors":["scroll"],"selector":"artdeco-entity-lockup-subtitle","multiple":true,"regex":"","delay":0}]}

Can anyone help?

Thank you in advance!

1 Like

Hey

I tried to upload your code but it requires you to be part of a linkedin group. Im waiting to be accepted by a bunch of a groups then I will try and trouble shoot it for you.

I did a post recently about instagram that displays the same scrolling behaviour - where the screen is scrolling up and down without pulling all results. You can see a video here. Does that look similar?

Perhaps there is a bug that affects both...

Thanks
David

David,

Thank you! I appreciate the help. And yes, the video you posted is exactly the same thing I'm dealing with.

It will scroll slightly, then return to the top of the page and scroll again. Then, it scrapes, but the data is all out of order (people's names and titles are on different lines). I'm sure it's just a very minimal change, but I can't figure it out for the life of me.

Thank you!

1 Like

Hey

Ok I just got this working for you. I was not able to access the LI Group so I just did it for another one - if you upload the below JSON file you should be able to just change the URL in 'edit metadata'

Here is where you were going wrong:

  1. It looked like you had selected the whole body rather than selecting each post as a 'wrapper' to repeat. This means you only need to select 'multiple' for this as this is the item you are looping.
  2. You had selected the wrong classes for the name and title - you do not need to select multiple for these as they only occur once in the above loop. I have updated for you and all the data seems to be pulling through now.

{"_id":"help","startUrl":["https://www.linkedin.com/groups/13522693/"],"selectors": [{"id":"wrapper","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.feed-shared-update-v2","multiple":true,"delay":"200"},{"id":"name","type":"SelectorText","parentSelectors":["wrapper"],"selector":".feed-shared-actor__name span","multiple":false,"regex":"","delay":0},{"id":"title","type":"SelectorText","parentSelectors":["wrapper"],"selector":".truncate span","multiple":false,"regex":"","delay":0}]}

Let me know if that works?

I think this is different to the issue I have, as I am struggling to distinguish the 'wrapper' from another key scrolling element somewhere I think.

Thanks
David

Hey David,

Your sitemap does scroll the group page, but it doesn't scroll the members page. The behavior is the same as my previous sitemap - it will scroll down, scroll back up, scroll down, and then stop. I think the scrolling element is different on the members page.

Your info was extremely helpful though, I appreciate it! I will try to find the right selector on the members page and let you know how it goes.

Thanks!

Ah my apologies - when I clicked on the link it took me back to the general members page, as it was behind sign in. I should have read the link closer. I have just taken a look at the members page and managed to replicate the issue. I am going to carry on investigating as it does indeed look like the two issues are related.

Bumping this. Anyone have an idea why it isn't working?