Scrape a list of entities

I would like to scrape a database of companies, getting three fields from each one of them.

I need to, first of all, click in a search button, then I have to enter in each of the company's detail page and here I would like to get those three text fields.

The urls are not dynamic....

Thanks for your advise

Url: http://www.impic.pt/impic/pt-pt/consultar/empresas-titulares-de-licenca-de-mediacao-imobiliaria

Sitemap:
{"_id":"impic","startUrl":["http://www.impic.pt/impic/pt-pt/consultar/empresas-titulares-de-licenca-de-mediacao-imobiliaria"],"selectors":[{"id":"next","type":"SelectorLink","parentSelectors":["start"],"selector":"div.col-sm-4.text-right a.btn","multiple":true,"delay":0},{"id":"element","type":"SelectorElementClick","parentSelectors":["next","start"],"selector":"div.block.impic-form","multiple":false,"delay":0,"clickElementSelector":"td.text-center:nth-of-type(3) a.btn-info","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"council","type":"SelectorText","parentSelectors":["element"],"selector":"div.information:nth-of-type(3) div.information-field:nth-of-type(3) span","multiple":false,"regex":"","delay":0},{"id":"district","type":"SelectorText","parentSelectors":["element"],"selector":"div.information:nth-of-type(3) div.information-field:nth-of-type(4) span","multiple":false,"regex":"","delay":0},{"id":"creation","type":"SelectorText","parentSelectors":["element"],"selector":"div.information:nth-of-type(3) div.information-field:nth-of-type(7) span","multiple":false,"regex":"","delay":0},{"id":"start","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"a.btn.btn-search","multiple":false,"delay":0,"clickElementSelector":"a.btn.btn-search","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

Hi!

You can set Page Load Delay to 10000-15000 ms (10-15 seconds), so before WebScraper starts the job, you can enter all necessary data for it to scrape through.

Hi Iconoclast

Sorry, but I didn't understand the way you are telling me to do this

Thanks

You can set Request Interval and Page Load Delay prior to actual scraping, before you hit Start scraping button.

image

If you set Page Load Delay, say, to 10000 ms (10 seconds), WebScraper will open a window, but it won't start scraping until 10 seconds you've just set have passed.

Hi Iconoclast

But I do not want to filter anything... i want to get their entire database with no filters...

The jason that I posted won't work, even I filter my results first

Thanks

Now I am trying to use this Jason:

{"_id":"impic","startUrl":["http://www.impic.pt/impic/pt-pt/consultar/empresas-titulares-de-licenca-de-mediacao-imobiliaria"],"selectors":[{"id":"page","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.block.impic-form","multiple":true,"delay":"2500","clickElementSelector":"div.col-sm-4.text-right a.btn","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"data","type":"SelectorLink","parentSelectors":["page"],"selector":"tr.page-1 td.text-center:nth-of-type(1) a.btn-info","multiple":false,"delay":0},{"id":"ami","type":"SelectorText","parentSelectors":["data"],"selector":"div.information:nth-of-type(3) div.information-field:nth-of-type(5) span","multiple":false,"regex":"","delay":0},{"id":"council","type":"SelectorText","parentSelectors":["data"],"selector":"div.information:nth-of-type(3) div.information-field:nth-of-type(3) span","multiple":false,"regex":"","delay":0},{"id":"district","type":"SelectorText","parentSelectors":["data"],"selector":"div.information:nth-of-type(3) div.information-field:nth-of-type(4) span","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["data"],"selector":"div.information:nth-of-type(3) div.information-field:nth-of-type(7) span","multiple":false,"regex":"","delay":0}]}

Because, the info I want to get is the "Nº Licença", the "Concelho", "Distrito" and "Licença emitida em" from their entire database (311 pages and around 6200 companies)

Thanks for your help