Consolidating multiple data points into single rows

Hey All,

I'm trying to scrape some data about members of Facebook groups (data not available via the graph API) and am running into an issue where multiple rows are being created for a single member.

I think I have things setup correctly (selecting the multiple option when appropriate), but I'm very new to using this tool so any feedback or help would be appreciated. Thanks!

Page:
https://www.facebook.com/groups/150128732422549/members/

Sitemap:
{"_id":"fb-group-texas-emergency-physicians","startUrl":["https://www.facebook.com/groups/150128732422549/members/"],"selectors":[{"id":"member-name","type":"SelectorLink","selector":"div._60ri a","parentSelectors":["_root"],"multiple":true,"delay":"100"},{"id":"about-link","type":"SelectorLink","selector":"a[data-tab-key="about"]:nth-of-type(1)","parentSelectors":["member-name"],"multiple":false,"delay":0},{"id":"work-link","type":"SelectorLink","selector":"a[data-testid="nav_edu_work"]","parentSelectors":["about-link"],"multiple":false,"delay":0},{"id":"current-job","type":"SelectorText","selector":"div[data-pnref="work"] li.experience ._2lzr > a:nth-of-type(1)","parentSelectors":["work-link"],"multiple":false,"regex":"","delay":0},{"id":"places","type":"SelectorLink","selector":"a[data-testid="nav_places"]","parentSelectors":["about-link"],"multiple":false,"delay":0},{"id":"current-city","type":"SelectorText","selector":"#pagelet_hometown #current_city a","parentSelectors":["places"],"multiple":false,"regex":"","delay":0}]}

Bump - can anyone help me? So close to getting this to work, just this one hurdle.

Is it possible to only generate one row for items scraped across multiple pages?

Change "places" parent selector to "work-link".

I tried something like that by doing this and still getting multiple rows. Is this what you mean?

{"_id":"fb-group-texas-emergency-physicians","startUrl":["https://www.facebook.com/groups/150128732422549/members/"],"selectors":[{"id":"member-name","type":"SelectorLink","selector":"div._60ri a","parentSelectors":["group-scroll"],"multiple":true,"delay":"100"},{"id":"about-link","type":"SelectorLink","selector":"a[data-tab-key="about"]:nth-of-type(1)","parentSelectors":["member-name"],"multiple":true,"delay":0},{"id":"work-link","type":"SelectorLink","selector":"a[data-testid="nav_edu_work"]","parentSelectors":["about-link"],"multiple":true,"delay":0},{"id":"current-job","type":"SelectorText","selector":"div[data-pnref="work"] li.experience ._2lzr > a:nth-of-type(1)","parentSelectors":["work-link"],"multiple":false,"regex":"","delay":0},{"id":"group-scroll","type":"SelectorElementScroll","selector":"div[data-name="GroupProfileGridItem"]","parentSelectors":["_root"],"multiple":true,"delay":"500"},{"id":"city-link","type":"SelectorLink","selector":"a[data-testid="nav_places"]","parentSelectors":["work-link"],"multiple":false,"delay":0},{"id":"current-city-text","type":"SelectorText","selector":"li#current_city a","parentSelectors":["city-link"],"multiple":false,"regex":"","delay":0}]}

Remove delays and uncheck "multiple" options from all selectors except "group-scroll".

1 Like

Amazing, that worked. Thank you!