Multiple items in parent element in different rows

eneri · May 2, 2023, 6:25am

Hello,
I would like to scrape data from CueTracker - 2023 World Championship - Snooker Results & Statistics

Each match is in its own div .match

I have tried this Sitemap:
{"_id":"snooker_worldchampionship_2023","startUrl":["https://cuetracker.net/tournaments/world-championship/2023/5550"],"selectors":[{"id":"game_title","multiple":true,"parentSelectors":["match div"],"regex":"","selector":"h5","type":"SelectorText"},{"id":"player_1","multiple":true,"parentSelectors":["match div"],"regex":"","selector":".player_1_name a","type":"SelectorText"},{"id":"player_2","multiple":true,"parentSelectors":["match div"],"regex":"","selector":".player_2_name a","type":"SelectorText"},{"id":"match div","multiple":true,"parentSelectors":["_root"],"selector":".match","type":"SelectorElement"}]}

But it creates a row for each selector and does not have 1 row per match with all 3 selectors in this row.

I've also tried to include every div between parent element and text I want to extract, but it didn't change the data.
{"_id":"snooker_worldchampionship_2023","startUrl":["https://cuetracker.net/tournaments/world-championship/2023/5550"],"selectors":[{"id":"game_title","multiple":true,"parentSelectors":["match div"],"regex":"","selector":".col-md-12 .row .round_name h5","type":"SelectorText"},{"id":"player_1","multiple":true,"parentSelectors":["match div"],"regex":"","selector":".col-md-12 .row .player_1_name b a","type":"SelectorText"},{"id":"player_2","multiple":true,"parentSelectors":["match div"],"regex":"","selector":".col-md-12 .row .player_2_name a","type":"SelectorText"},{"id":"match div","multiple":true,"parentSelectors":["_root"],"selector":".match","type":"SelectorElement"}]}

And according to the sitemap on this page: Web Scraper << How to >> Scrape multiple items within a listings page this should not be necessary.

What am I missing.

Thanks,
Irene

PS: Using Chrome Version 112.0.5615.137 with the free extension

ViestursWS · May 2, 2023, 1:41pm

@eneri Hello. It seems that most of the data appear in a scattered manner(most of the unique records for each selector appearing in a separate row).

The issue lies in the sitemap selector setup itself, therefore, in this case, would recommend updating your sitemap by using an 'Element' selector set as a 'parent' with the 'multiple' option checked and all of the remaining selectors set as it's 'child' with 'multiple' option not checked.

Example:

{"_id":"snooker_worldchampionship_2023","startUrl":["https://cuetracker.net/tournaments/world-championship/2023/5550"],"selectors":[{"id":"game_title","multiple":false,"parentSelectors":["match div"],"regex":"","selector":".col-md-12 .row .round_name h5","type":"SelectorText"},{"id":"player_1","multiple":false,"parentSelectors":["match div"],"regex":"","selector":".col-md-12 .row .player_1_name b a","type":"SelectorText"},{"id":"player_2","multiple":false,"parentSelectors":["match div"],"regex":"","selector":".col-md-12 .row .player_2_name a","type":"SelectorText"},{"id":"match div","multiple":true,"parentSelectors":["_root"],"selector":".match","type":"SelectorElement"}]}

eneri · May 3, 2023, 7:01am

Great, thanks! That worked.