Hi
I am trying to scrape the text of all the laws in Italy's official gazette.
I am getting stuck when the links reach this level:
https://www.gazzettaufficiale.it/atto/serie_generale/caricaDettaglioAtto/originario?atto.dataPubblicazioneGazzetta=1988-01-02&atto.codiceRedazionale=087G0741&elenco30giorni=false
I have tried two methods:
(1) Each article of the law is paginated without the URL changing. However, I cannot select the page using the element-click type.
(2) Alternatively, the webpage allows you to load the complete law by clicking "Atto Completo" on the top left. This opens a pop-up window. In the pop-up window, you can click "Visualizza" which redirects you you a base URL with the complete act loaded. But how do I scrape the content of a redirect?
This is the sitemap I have so far using the method (2).
Sitemap:
{"_id":"italyofficial","startUrl":["https://www.gazzettaufficiale.it/archivioCompleto"],"selectors":[{"id":"Generalserieslinks","type":"SelectorLink","parentSelectors":["_root"],"selector":"div#yearGazzettaSG1:nth-of-type(1) a","multiple":true,"delay":0},{"id":"Lawlinks","type":"SelectorLink","parentSelectors":["Generalserieslinks"],"selector":"a.elenco_gazzette","multiple":true,"delay":0},{"id":"eachlaw","type":"SelectorLink","parentSelectors":["Lawlinks"],"selector":"span:nth-of-type(n+2) a:nth-of-type(1)","multiple":true,"delay":0},{"id":"Completearticle","type":"SelectorPopupLink","parentSelectors":["eachlaw"],"selector":".stampabile span","multiple":false,"delay":0},{"id":"visualise","type":"SelectorPopupLink","parentSelectors":["Completearticle"],"selector":"input[type='submit']","multiple":false,"delay":0},{"id":"Text","type":"SelectorText","parentSelectors":["visualise"],"selector":".wrapper_pre pre","multiple":true,"regex":"","delay":0}]}
Thank you for your help!