Help! Is there a workaround to assign a parent element despite webpage structure

Hi there,

Please excuse me if something similar has been asked before, but I'm trying to scrape race card and form data from the Timeform website.


For example: . I'm trying organise it in a certain way but the structure of the webpage doesn't seem to allow for it. Basically as you can see from the example racecard page there is a section that encompasses the whole race card, then another within that for the form data and a table within that again for the form data of all 6 entries. Within that table each data point is a separate td without any additional hierarchy. What I want to do however, is organise the scraped data such that each entry's form data is a child element of its trap number but I can't find a way of doing it as the trap number is just a td in the overall form data table so there is no hierarchical relationship between it and the rest of the entry's data. Is there a workaround that would enable me to do this?

Many thanks in advance for your help.

@ShapoMelon Hello, to extract data from multiple listing elements you can use the 'Element' selector set as a 'parent' with the 'Multiple' option checked and all of the remaining selectors set as it's 'child' with 'Multiple' option not checked.

Learn more: Multiple items | Web Scraper How To

Reference sitemap:

{"_id":"timeform-com","startUrl":["https://www.timeform.com/greyhound-racing/racecards/kinsley/1106/2023-11-10/1139437"],"selectors":[{"id":"wrapper","multiple":true,"parentSelectors":["_root"],"selector":"table#racecard tbody.rpb","type":"SelectorElement"},{"id":"name","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"a.rpb-greyhound","type":"SelectorText"},{"id":"id-number","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".rpb-entry-details-trap span","type":"SelectorText"},{"extractAttribute":"","id":"details","parentSelectors":["wrapper"],"selector":".rpb-entry-details-2 td:nth-of-type(n+2), td:nth-of-type(4) span.rp-setting-pedigree","type":"SelectorGroup"}]}

Hi, thanks so much for your answer. I'll try that to see if it works.

All the best.

Hi again, so I tried importing your code and unless I'm doing something wrong (which is perfectly possible :slight_smile: ) I don't think this gives me the result I'm looking for. Perhaps I didn't explain clearly enough in my original post or that I have misunderstood your answer, but what I'm trying to get is the following (see image):

  • Despite the wrapper (yellow) being one table, I'd like each trap number (circled in various colours) to be a child of the wrapper
  • Then I would like each trap number to be the parent of the corresponding entry's form data if that makes sense.

Hopefully the attached image will make it clearer. Again perhaps I'm wrong but I don't think your previous answer achieves that.

Thanks
again for your help.

@ShapoMelon Hello, to extract data from the other tabs - you will have to create additional sub-wrapper elements - set as a child to the main wrapper.

Reference sitemap:

{"_id":"timeform-com-test","startUrl":["https://www.timeform.com/greyhound-racing/racecards/kinsley/1106/2023-11-10/1139437"],"selectors":[{"id":"wrapper","multiple":true,"parentSelectors":["_root"],"selector":"table#racecard tbody.rpb","type":"SelectorElement"},{"id":"number","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"td.rpb-entry-details-trap","type":"SelectorText"},{"id":"name","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"a.rpb-greyhound","type":"SelectorText"},{"id":"id-number","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".rpb-entry-details-trap span","type":"SelectorText"},{"extractAttribute":"","id":"details","parentSelectors":["wrapper"],"selector":".rpb-entry-details-2 td:nth-of-type(n+2), td:nth-of-type(4) span.rp-setting-pedigree","type":"SelectorGroup"},{"id":"wrapper-2","multiple":true,"parentSelectors":["wrapper"],"selector":"tr:has(.recent-form-meeting-date):has(a.recent-form-date ) tr:has(a.recent-form-date )","type":"SelectorElement"},{"id":"date","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td.recent-form-meeting-date","type":"SelectorText"},{"id":"type","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(2)","type":"SelectorText"},{"id":"track","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(3)","type":"SelectorText"},{"id":"dist","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(4)","type":"SelectorText"},{"id":"grade","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(5)","type":"SelectorText"},{"id":"eye","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(6)","type":"SelectorText"},{"id":"proxy","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(7)","type":"SelectorText"},{"id":"trp","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td[title='The trap this greyhound ran from']","type":"SelectorText"},{"id":"tf-sec","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(9)","type":"SelectorText"},{"id":"bend","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td.recent-form-hide-5","type":"SelectorText"},{"id":"fin","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td[title='The position of the greyhound in this race']","type":"SelectorText"},{"id":"btn","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td.recent-form-hide-8","type":"SelectorText"},{"id":"tf-going","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td.recent-form-hide-4","type":"SelectorText"},{"id":"isp","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td[title='The official starting price of the greyhound in this race']","type":"SelectorText"},{"id":"tf-time","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(15)","type":"SelectorText"},{"id":"sec-rtg","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(16)","type":"SelectorText"},{"id":"rtg","multiple":false,"parentSelectors":["wrapper-2"],"regex":"","selector":"td:nth-of-type(17)","type":"SelectorText"}]}

Hey that's great, it works much better. Just one last question, how did you select and create wrapper 2? When I try, I cannot select the selector represented in blue in the attached image, either I can select the whole racecard wrapper (wrapper) or the individual data points. It doesn't let me select only the selector that you have defined as wrapper 2.