Scraping hidden phone/email/website information

Uzhipius · July 20, 2021, 5:23am

Hi,

Cannot scrape phone numbers from the site.

Selected Element Attribute, and from site's Elements preview, found the "phone-content" is storing full phone info but when selected, data output is still NULL.

Could you help me figuring out how to extract full numbers from the site as well emails and website urls?

Url: https://www.pkt.pl/szukaj/geodezja/warszawa

Sitemap:

{"_id":"pkt_geodezja","startUrl":["Najlepszy geodeta w lokalizacji warszawa"],"selectors":[{"id":"name","type":"SelectorText","parentSelectors":["_root"],"selector":".company name • Geodezja • Geodeci • pkt.pl a","multiple":true,"regex":"","delay":0},{"id":"phone","type":"SelectorElementAttribute","parentSelectors":["_root"],"selector":"span.call-text","multiple":true,"extractAttribute":"phone-content","delay":0}]}

ViestursWS · July 20, 2021, 2:22pm

Hi @Uzhipius

I would suggest using an element selector with "multiple" option checked for each of the companies - div.box-content and set it as a "parent" for the name - .company-name a and phone(element attribute - div.call--phone a with attribute name - data-phone)

Sitemap example:

{"_id":"pkt_geodezja","startUrl":["https://www.pkt.pl/szukaj/geodezja/warszawa"],"selectors":[{"delay":0,"id":"name","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".company-name a","type":"SelectorText"},{"delay":0,"extractAttribute":"data-phone","id":"phone","multiple":false,"parentSelectors":["wrapper"],"selector":"div.call--phone a","type":"SelectorElementAttribute"},{"delay":0,"id":"wrapper","multiple":true,"parentSelectors":["_root"],"selector":"div.box-content","type":"SelectorElement"}]}

Hope it helps!

bretfeig · July 20, 2021, 7:44pm

Why div.call--phone a and now div.call-cell a?

When a class has two elements ie (call-cell call--phone) does it matter which one you use?