Images from Carroussel Loading on the Fly

seine · May 5, 2021, 5:05pm

Hi,

I struggle using the Element click selector to retrieve urls of pictures in a carroussel that is loading the next images on the fly.

Url: Achat maison, appartement... ou location | Immobilier.notaires.fr

Here is what i tried

{
  "id":"notaires_pictures",
  "type":"SelectorElementClick",
  "parentSelectors":[
    "notaires_annonce_link"
  ],
  "selector":"div.g-slider",
  "multiple":true,
  "delay":"500",
  "clickElementSelector":"i[aria-label=Next]",
  "clickType":"clickMore",
  "discardInitialElements":"do-not-discard",
  "clickElementUniquenessType":"uniqueHTML"
}

The issue is that it keeps circling through the pictures and never stop.

Is it something achievable with WebScraper?

Thanks a lot,
Seine

seine · May 7, 2021, 9:26am

Hello,

anyone has an idea if it's at least doable?

Thanks!
Seine

ViestursWS · May 7, 2021, 3:29pm

@seine Hi. I tried to do it by element-click selector but there's no indication that the next button will end... so it goes in the never-ending loop. I managed to get at best 3 images by using the element-click selector by using "click once".

My example:

{"_id":"immobilier-fr","startUrl":["https://www.immobilier.notaires.fr/fr/annonce-immo/immo-interactif/appartement/paris-05-75005/1376796"],"selectors":[{"id":"images","type":"SelectorElementAttribute","parentSelectors":["click"],"selector":".g-slider gallery-item > gallery-image > div","multiple":true,"extractAttribute":"style","delay":0},{"id":"click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"html","multiple":false,"delay":"600","clickElementSelector":"i[aria-label=\"Next\"]","clickType":"clickOnce","discardInitialElements":"discard","clickElementUniquenessType":"uniqueCSSSelector"}]}

leemeng · May 8, 2021, 12:32am

Interesting site. You can get the images without clicking if you scrape the English version of the site, which has an easier structure to scrape. You can change languages by clicking on EN button in top right.

Sitemap:
{"_id":"forum-notaires_pictures","startUrl":["https://www.immobilier.notaires.fr/en/real-estate-advert/immo-interactif/apartment/paris-05-75005/1376796"],"selectors":[{"id":"Title","type":"SelectorText","parentSelectors":["_root"],"selector":"h1.titre","multiple":false,"regex":""},{"id":"Images wrappers","type":"SelectorElement","parentSelectors":["_root"],"selector":"div#slides div.slide","multiple":true},{"id":"Image","type":"SelectorImage","parentSelectors":["Images wrappers"],"selector":"img","multiple":false}]}

This will yield a bunch of low-res images like:
https://media.immobilier.notaires.fr/inotr/media/29/06032/1376796/1f4f6816_QVGA.jpg https://media.immobilier.notaires.fr/inotr/media/29/06032/1376796/6525efac_QVGA.jpg

You can get the higher-res images just by changing all the resolution tags at the end from QVGA to SVGA (search 'n replace all). This can be done in Notepad or Excel. For example:

https://media.immobilier.notaires.fr/inotr/media/29/06032/1376796/1f4f6816_SVGA.jpg https://media.immobilier.notaires.fr/inotr/media/29/06032/1376796/6525efac_SVGA.jpg

UPDATE: Turns out some of the larger images are just VGA, _VGA.jpg and not SVGA. so this trick won't work properly to get all images. You can try the more standard click and scrape sitemap on the English site.

seine · May 11, 2021, 9:52am

Thank you very much both, this was very useful.
@leemeng quite smart the language switch that must be a trick that one can use on many websites where the layout is not uniform across versions