Help to scrape uk horse racing tips

Hi Guy's, newbie / beginner here… I am trying to scrape the horse racing tips data from https://gg.co.uk/tips/today
I can extract the horse, time and course data and exports the data into csv just fine.

What I would like to add to this is the forecast odds data that are shown in the morning. The "data element" is then updated during the day to show the race result and odds.

When I add the odds element, I do get the data but the structure is all wrong.

What would be the best way to get the odds/results alongside the currently extracted data.

This import will get the time, course and horses name.

{"_id":"gg1","startUrl":["https://gg.co.uk/tips/today"],"selectors":[{"id":"htc","type":"SelectorElement","parentSelectors":["_root"],"selector":"td:nth-of-type(2)","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["htc"],"selector":"a.horse","multiple":true,"regex":"","delay":0},{"id":"t-c","type":"SelectorText","parentSelectors":["htc"],"selector":"a:nth-of-type(2)","multiple":false,"regex":"","delay":0}]}

Thanks in advance to anyone that can help

Tim

UPDATE - This sitemap nearly works, but it doesn’t get the results in the correct column though. I don't seem able to select the elements correctly.

{"_id":"gg-results","startUrl":["https://gg.co.uk/tips/02-jan-2020"],"selectors":[{"id":"ele1","type":"SelectorElement","parentSelectors":["_root"],"selector":"td:nth-of-type(2)","multiple":true,"delay":0},{"id":"time","type":"SelectorText","parentSelectors":["ele1"],"selector":"a.winning-post","multiple":false,"regex":"","delay":0},{"id":"horse","type":"SelectorText","parentSelectors":["ele1"],"selector":"a.horse","multiple":false,"regex":"","delay":0},{"id":"ele2","type":"SelectorElement","parentSelectors":["_root"],"selector":"td.tips-price","multiple":true,"delay":0},{"id":"result","type":"SelectorText","parentSelectors":["ele2"],"selector":"parent","multiple":false,"regex":"","delay":0}]}

Any help is much appreciated..

hello Tim how far have you got with this now

Hi Paul, I do scrape the data each day which works very well. although I do have to do a manual cleanup so the data ends up as just results without any pulled up, fell, race abandoned etc.

I still don't understand how it all works but using other contributers import code it has worked well for me, so i am very happy with the scraper

Here is a copy of the current code

{"_id":"gg1","startUrl":["https://gg.co.uk/tips/today"],"selectors":[{"id":"all-table","type":"SelectorTable","parentSelectors":["_root"],"selector":"table","multiple":true,"columns":[{"header":"SILK","name":"SILK","extract":true},{"header":"OUR TIP","name":"OUR TIP","extract":true},{"header":"PRICE","name":"PRICE","extract":true}],"delay":0,"tableDataRowSelector":"tr:nth-of-type(n+2)","tableHeaderRowSelector":"tr:nth-of-type(1)"},{"id":"todays-date","type":"SelectorText","parentSelectors":["_root"],"selector":"h5","multiple":false,"regex":"","delay":0}]}

hello tim
only just seen this thank you for the update - i was trying to use beautiful soup and python
so far i have this - but would like to store these then scrape the stored links for all the horse profiles of that day.
i do back a few horses so wanted a quick way of looking at certain facts ie, weight,rider,distance,form ect. each horse/rider profile

import requests
from bs4 import BeautifulSoup
import json

#working links web
page = requests.get('https://gg.co.uk/tips/today')
soup = BeautifulSoup(page.text, 'html.parser')

link_set = set()
for link in soup.find_all('a',{'class' : 'winning-post'}):
web_links = link.get("href")
print(web_links)
link_set.add(web_links)