Table scraping: duplicated values across columns

When extracting data from a table, the table has multiple headers but I don't really care as I only need the last header. The problem is the last header is like a sub-header so some of the column names are duplicated on the last header. So, what's happening is even though I rename the duplicated attributes so they'll map to new names on my output, when I scrape the data, web scraper chooses one of the duplicated values and puts it's in each one of the columns with the same name. So for example the secondary header has Passing, Rushing, Receiving. The final header, which I need, has the field YD for each one of the higher level categories. So if a row in the table has YD = 0 YD = 120 YD = 20 meaning that Passing Yd = 0 , Rushing YD = 120 and Receiving Yard = 20 my end result, using the new names I entered in the column selection I get:
PssYD = 120, RshYD = 120, RcvYD = 120 which of course is wrong. Not sure if this is a bug or how I'm approaching this.

Url: https://www.cbssports.com/fantasy/football/stats/sortable/points/RB/standard/stats/2018/1?&print_rows=9999&_1:col_1=2

Sitemap:
{"_id":"nfl_offense_weekly_stats","startUrl":["https://www.cbssports.com/fantasy/football/stats/sortable/points/RB/standard/stats/2018/2?&print_rows=9999","https://www.cbssports.com/fantasy/football/stats/sortable/points/WR/standard/stats/2018/2?&print_rows=9999","https://www.cbssports.com/fantasy/football/stats/sortable/points/QB/standard/stats/2018/2?&print_rows=9999","https://www.cbssports.com/fantasy/football/stats/sortable/points/TE/standard/stats/2018/2?&print_rows=9999"],"selectors":[{"id":"table1","type":"SelectorTable","parentSelectors":["_root"],"selector":"table.data","multiple":true,"columns":[{"header":"Player","name":"Player","extract":true},{"header":"ATT","name":"PssATT","extract":true},{"header":"CMP","name":"CMP","extract":true},{"header":"YD","name":"PssYD","extract":true},{"header":"TD","name":"PssTD","extract":true},{"header":"INT","name":"INT","extract":true},{"header":"RATE","name":"RATE","extract":true},{"header":"ATT","name":"RshATT","extract":true},{"header":"YD","name":"RshYD","extract":true},{"header":"AVG","name":"RshAVG","extract":true},{"header":"TD","name":"RshTD","extract":true},{"header":"TARGT","name":"TARGT","extract":true},{"header":"RECPT","name":"RECPT","extract":true},{"header":"YD","name":"RcvYD","extract":true},{"header":"AVG","name":"RcvAVG","extract":true},{"header":"TD","name":"RcvTD","extract":true},{"header":"FL","name":"FL","extract":true},{"header":"FPTS","name":"FPTS","extract":true}],"delay":0,"tableDataRowSelector":"tr:nth-of-type(n+4)","tableHeaderRowSelector":"tr.label"}]}

Hello there!

Unfortunately, Table selector doesn't always work as expected, but you can always create an Element selector setup that will work exactly as you want.

Check this one:
{"_id":"nfl_offense_weekly_stats2","startUrl":["https://www.cbssports.com/fantasy/football/stats/sortable/points/RB/standard/stats/2018/2?&print_rows=9999","https://www.cbssports.com/fantasy/football/stats/sortable/points/WR/standard/stats/2018/2?&print_rows=9999","https://www.cbssports.com/fantasy/football/stats/sortable/points/QB/standard/stats/2018/2?&print_rows=9999","https://www.cbssports.com/fantasy/football/stats/sortable/points/TE/standard/stats/2018/2?&print_rows=9999"],"selectors":[{"id":"where?","type":"SelectorText","parentSelectors":["_root"],"selector":"tr.title td","multiple":false,"regex":"","delay":0},{"id":"table","type":"SelectorElement","parentSelectors":["_root"],"selector":"tr:nth-of-type(n+4):not(:last-child)","multiple":true,"delay":0},{"id":"Player","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(1)","multiple":false,"regex":"","delay":0},{"id":"PASSING_ATT","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"PASSING_CMP","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(3)","multiple":false,"regex":"","delay":0},{"id":"PASSING_YD","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(4)","multiple":false,"regex":"","delay":0},{"id":"PASSING_TD","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(5)","multiple":false,"regex":"","delay":0},{"id":"PASSING_INT","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(6)","multiple":false,"regex":"","delay":0},{"id":"PASSING_RATE","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(7)","multiple":false,"regex":"","delay":0},{"id":"RUSHING_ATT","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(8)","multiple":false,"regex":"","delay":0},{"id":"RUSHING_YD","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(9)","multiple":false,"regex":"","delay":0},{"id":"RUSHING_AVG","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(10)","multiple":false,"regex":"","delay":0},{"id":"RUSHING_TD","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(11)","multiple":false,"regex":"","delay":0},{"id":"RECEIVING_TARGT","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(12)","multiple":false,"regex":"","delay":0},{"id":"RECEIVING_YD","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(14)","multiple":false,"regex":"","delay":0},{"id":"RECEIVING_AVG","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(15)","multiple":false,"regex":"","delay":0},{"id":"RECEIVING_TD","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(16)","multiple":false,"regex":"","delay":0},{"id":"MISC_FL","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(17)","multiple":false,"regex":"","delay":0},{"id":"MISC_FPTS","type":"SelectorText","parentSelectors":["table"],"selector":"td:nth-of-type(18)","multiple":false,"regex":"","delay":0}]}

1 Like

Wow, this id great! Exactly what I was looking for. Thank you very much. Took me a little while to figure out how you did it but I think I got it now. Thanks again