Help with a sitemap

Hey guys. I found this extension and i can honestly say it's incredible.
I am trying to scrape the phone number from a local ads website.

Start is:
Url: https://www.olx.ro/servicii-afaceri-colaborari/meseriasi-constructori/

In order to get the phone number you need to click on a "Show phone" button that will load the phone itself.
I managed to make it work(might not be the ideal way) for the first page of results.

Questions is: how do i make it go to page 2,3,4 ... and start again the rutine i built?

Sitemap:
{"_id":"test1","startUrl":["https://www.olx.ro/servicii-afaceri-colaborari/meseriasi-constructori/"],"selectors":[{"id":"linkselect","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.marginright5","multiple":true,"delay":0},{"id":"telefon","type":"SelectorElementClick","parentSelectors":["linkselect"],"selector":"div.contact-button strong.xx-large","multiple":false,"delay":0,"clickElementSelector":"div.contact-button","clickType":"clickOnce","discardInitialElements":true,"clickElementUniquenessType":"uniqueText"},{"id":"dasd","type":"SelectorText","parentSelectors":["telefon"],"selector":"parent","multiple":false,"regex":"","delay":0}]}

Much appreciate any support!

:slight_smile:

Update on my situation. I managed to make it go to the next page using ?page=[1-500], but i found a diffferent probleme. After some time the "script" stops clicking on "show number" so i get a lot of 07xx-xxx-xxx.

Any thoughts?

Thank you !

Hi there!

You're getting 07xxx-xxxx because you've set your selector to a field that changes parent element class once it's pressed. In order to get a value that you get once button is pressed, you have to narrow the selector to a state when it's activated.
Please look at the pages below:


Your fixed sitemap (i've added 1-500 pages back):

{"_id":"test1","startUrl":["https://www.olx.ro/servicii-afaceri-colaborari/meseriasi-constructori/?page=[1-500]"],"selectors":[{"id":"linkselect","type":"SelectorLink","selector":"a.marginright5","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"telefon","type":"SelectorElementClick","selector":"div.activated > strong","parentSelectors":["linkselect"],"multiple":false,"delay":"0","clickElementSelector":"div.contact-button","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"phone_number","type":"SelectorText","selector":"_parent_","parentSelectors":["telefon"],"multiple":false,"regex":"","delay":0}]}

Hi!

Thank you for your answer. I have added your code to the scraper, but as far as i can see it's no longer saving anything.

Yeah, I've just noticed some pages won't let you see the number at all.

Like this one: https://www.olx.ro/oferta/modificam-renovam-executam-acoperisuri-complete-mansarde-IDbrFDM.html#a0e0f035ff

If it's a website bug, i can only recommend scraping fresh ones, like pages 1-100 or even 1-50.

UPDATE: seem to be a scrape protection.

UPDATE2: nope, it's a website bug, some pages won't let you see the number and some do. It's either a bug or owner decided to hide it/deactivate in order to get emails/messages instead.

Hi. I know the website, if you hide it i doesn't show. I reused your code in a fresh chrome, and it's working just fine until record 98, from a 2-10 series. After it reaches 98 records it stops working.

Update: Not sure if it's a website bug or protection. After the scraper stops collecting, if you try and start it again it will not collect anything. If you close chrome and start it again it will work. So not sure where the problem is.

Update: Tried 2-5 pages , went up to record 101.

I can only recommend increasing loading delay on link selector up to 5-10 seconds.

I did that, and the same thing. After some pages it stop "Pressing" to show the number and stop recording even empty values.