How can I scrape this table?

scrapers · April 30, 2021, 11:45am

I'm trying to scrape tables like the one on Astra 2E: 10714 H - LyngSat. The first row in this table should be ignored. The second row should be treated as the header. Row 3 and onwards contain the data.

Does anyone know how I can extract the data from row 3 and onwards correctly? Many thanks in advance.

Sitemap:
{"_id":"lyngsat-20210428","startUrl":["https://www.lyngsat.com/index.html"],"selectors":[{"id":"continent","type":"SelectorLink","parentSelectors":["_root"],"selector":"[width='468'] tr:nth-of-type(2) a","multiple":true,"delay":0},{"id":"positionlink","type":"SelectorLink","parentSelectors":["continent"],"selector":"[face='Verdana'] font a","multiple":true,"delay":0},{"id":"transponderlink","type":"SelectorLink","parentSelectors":["positionlink"],"selector":"[face='Verdana'] b a","multiple":true,"delay":0},{"id":"transpondertable","type":"SelectorTable","parentSelectors":["transponderlink"],"selector":"table.mux-table","multiple":true,"columns":[{"header":"Astra 2E 10773 H © LyngSat, last updated 2021-04-27 - https://www.lyngsat.com/muxes/Astra-2E_UK_10773-H.html","name":"test","extract":false}],"delay":0,"tableDataRowSelector":"tr:nth-of-type(n+3)","tableHeaderRowSelector":"tr:nth-of-type(2)"}]}

ViestursWS · April 30, 2021, 1:46pm

Hi @scrapers
You need to make an element selector, targeting table row and then defining that this row has something in it that other rows does not.

Example:

{"_id":"lyngsat-20210428","startUrl":["https://www.lyngsat.com/muxes/NSS-9_Global_4055-L.html"],"selectors":[{"id":"wrapper","type":"SelectorElement","parentSelectors":["_root"],"selector":".mux-table tr:nth-of-type(n+3):has(td.td-medium)","multiple":true,"delay":0},{"id":"SID","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(0)","multiple":false,"regex":"","delay":0},{"id":"Channel-Name","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(2)","multiple":false,"regex":"","delay":0},{"id":"Video","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(4)","multiple":false,"regex":"","delay":0},{"id":"VPID","type":"SelectorElement","parentSelectors":["wrapper"],"selector":"td:nth(5)","multiple":false,"delay":0},{"id":"APID","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(6)","multiple":false,"regex":"","delay":0},{"id":"Language","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(7)","multiple":false,"regex":"","delay":0},{"id":"Audio","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(8)","multiple":false,"regex":"","delay":0},{"id":"Encryption","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(9)","multiple":false,"regex":"","delay":0},{"id":"Package","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(10)","multiple":false,"regex":"","delay":0},{"id":"Source-Updated","type":"SelectorText","parentSelectors":["wrapper"],"selector":"td:nth(11)","multiple":false,"regex":"","delay":0}]}

Hope it helps.

scrapers · April 30, 2021, 2:21pm

Hi @ViestursWS, many thanks for your help. The element wrapper seems to do the trick!