Scraping text 'behind' image

HighMountain · January 6, 2019, 8:27pm

Hello everyone,

I'm pretty new to the https://webscraper.io/ tool and I have a small question about its use.

I want to scrape data from the following website: Latest contract extensions

The data on this website is represented as a table with the following columns: 'Player', 'Age', 'Nationality', 'Club' and 'New contract until'. Scrapping this table works perfectly for all columns, except for the 'Nationality' column. Since the content in this column is represented as image instead of text, this column is left blank after scrapping. However, I want to scrape the text you see when moving your mouse cursor towards the image. For example, the scraper should scrape the text 'Lithuania' as nationality in the following case (second row): https://i.imgur.com/5W7yCfy.png

Does anyone know how I can do this? Any help would be really appreciated!

bretfeig · January 7, 2019, 12:33am

Use the Element Attribute selector
attribute name = alt

HighMountain · January 7, 2019, 4:22pm

Thanks for your reply! That actually works pretty easy

However, I don't want to scrape the data as an individual element. I want to scrape it as a column in a table, since the data belongs to other elements in the table. Is there a way to get this done?

Even scrapping the original table and the element attribute separately (and after that, merging it together in excel) seems not to work, because webscraper changes the order of the elements in the table.

I hope you can help me out!

bretfeig · January 8, 2019, 12:50am

something like this?

{"_id":"tramsfer-markt","startUrl":["https://www.transfermarkt.com/statistik/letztevertragsverlaengerungen"],"selectors":[{"id":"Table-pagin","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"table.items > tbody > tr","multiple":true,"delay":0,"clickElementSelector":"li.naechste-seite a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Name","type":"SelectorText","parentSelectors":["Table-pagin"],"selector":"td:nth-of-type(1) td.hauptlink","multiple":false,"regex":"","delay":0},{"id":"Position","type":"SelectorText","parentSelectors":["Table-pagin"],"selector":"td:nth-of-type(1) tr:nth-of-type(2) td","multiple":false,"regex":"","delay":0},{"id":"age","type":"SelectorText","parentSelectors":["Table-pagin"],"selector":"td.zentriert:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"Nationality","type":"SelectorElementAttribute","parentSelectors":["Table-pagin"],"selector":"td.zentriert img.flaggenrahmen","multiple":false,"extractAttribute":"alt","delay":0},{"id":"club","type":"SelectorText","parentSelectors":["Table-pagin"],"selector":"td:nth-of-type(4)","multiple":false,"regex":"","delay":0},{"id":"new contract until","type":"SelectorText","parentSelectors":["Table-pagin"],"selector":"td.zentriert.hauptlink","multiple":false,"regex":"","delay":0}]}

HighMountain · January 8, 2019, 2:21pm

Yes, that is indeed exactly what I meant!

Thank you very much, you helped me a lot with this