Help to scrape uk horse racing tips

Tim-uk · January 3, 2020, 3:46pm

Hi Guy's, newbie / beginner here… I am trying to scrape the horse racing tips data from https://gg.co.uk/tips/today
I can extract the horse, time and course data and exports the data into csv just fine.

What I would like to add to this is the forecast odds data that are shown in the morning. The "data element" is then updated during the day to show the race result and odds.

When I add the odds element, I do get the data but the structure is all wrong.

What would be the best way to get the odds/results alongside the currently extracted data.

This import will get the time, course and horses name.

{"_id":"gg1","startUrl":["https://gg.co.uk/tips/today"],"selectors":[{"id":"htc","type":"SelectorElement","parentSelectors":["_root"],"selector":"td:nth-of-type(2)","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["htc"],"selector":"a.horse","multiple":true,"regex":"","delay":0},{"id":"t-c","type":"SelectorText","parentSelectors":["htc"],"selector":"a:nth-of-type(2)","multiple":false,"regex":"","delay":0}]}

Thanks in advance to anyone that can help

Tim

UPDATE - This sitemap nearly works, but it doesn’t get the results in the correct column though. I don't seem able to select the elements correctly.

{"_id":"gg-results","startUrl":["https://gg.co.uk/tips/02-jan-2020"],"selectors":[{"id":"ele1","type":"SelectorElement","parentSelectors":["_root"],"selector":"td:nth-of-type(2)","multiple":true,"delay":0},{"id":"time","type":"SelectorText","parentSelectors":["ele1"],"selector":"a.winning-post","multiple":false,"regex":"","delay":0},{"id":"horse","type":"SelectorText","parentSelectors":["ele1"],"selector":"a.horse","multiple":false,"regex":"","delay":0},{"id":"ele2","type":"SelectorElement","parentSelectors":["_root"],"selector":"td.tips-price","multiple":true,"delay":0},{"id":"result","type":"SelectorText","parentSelectors":["ele2"],"selector":"parent","multiple":false,"regex":"","delay":0}]}

Any help is much appreciated..

paulf · March 24, 2020, 9:54pm

hello Tim how far have you got with this now

Tim-uk · March 25, 2020, 1:20pm

Hi Paul, I do scrape the data each day which works very well. although I do have to do a manual cleanup so the data ends up as just results without any pulled up, fell, race abandoned etc.

I still don't understand how it all works but using other contributers import code it has worked well for me, so i am very happy with the scraper

Tim-uk · March 25, 2020, 1:25pm

Here is a copy of the current code

{"_id":"gg1","startUrl":["https://gg.co.uk/tips/today"],"selectors":[{"id":"all-table","type":"SelectorTable","parentSelectors":["_root"],"selector":"table","multiple":true,"columns":[{"header":"SILK","name":"SILK","extract":true},{"header":"OUR TIP","name":"OUR TIP","extract":true},{"header":"PRICE","name":"PRICE","extract":true}],"delay":0,"tableDataRowSelector":"tr:nth-of-type(n+2)","tableHeaderRowSelector":"tr:nth-of-type(1)"},{"id":"todays-date","type":"SelectorText","parentSelectors":["_root"],"selector":"h5","multiple":false,"regex":"","delay":0}]}

leemeng · July 10, 2020, 2:30pm

You only need to place your scrapers within a wrapper (container) and they should all be grouped in the same rows, e.g.:

{"_id":"forum-gg-results","startUrl":["https://gg.co.uk/tips/10-jul-2020"],"selectors":[{"id":"ele1","type":"SelectorElement","parentSelectors":["Row wrappers"],"selector":"td:nth-of-type(2)","multiple":false,"delay":0},{"id":"time","type":"SelectorText","parentSelectors":["ele1"],"selector":"a.winning-post","multiple":false,"regex":"","delay":0},{"id":"horse","type":"SelectorText","parentSelectors":["ele1"],"selector":"a.horse","multiple":false,"regex":"","delay":0},{"id":"Row wrappers","type":"SelectorElement","parentSelectors":["_root"],"selector":"table.standard-table tr:nth-of-type(n+2)","multiple":true,"delay":0},{"id":"Price","type":"SelectorText","parentSelectors":["Row wrappers"],"selector":"td.tips-price","multiple":false,"regex":"","delay":0}]}