Wepage Data Source

Jamez100 · November 8, 2023, 8:26pm

Hi,

I'm trying to pull hoilday serach results from the URL below and trying to work out the best way to scrape the data. I've noticed there is a difference in the HTML code depending on how the code is viewed.

For example, if I inspect the 'resort place name/title' text using chome, the HTML shows as follows:

h1 class="detailed-card-header__title">Caretta Paradise & Waterpark /h1

But if I view the page source, the same element if shown as follows:

h1 class="detailed-card-header__title">{{ name }} /h1

I trying to work out the data source for {{ name }} ? Is this served as a json file (or similar) to the browser, but I couldn't find a data file in my cache?

Or is the tag {{ name }} dynamically servering data to the page via a server request? Such as jquery or javascript?

I'm new to scraping, and have a script that can scrape the class="detailed-card-header__title" text using selenium and beautifulsoup, but just wondering if there is a shortcut to directly query the {{ name }} ?

Url: https://www.jet2holidays.com/search/results?airport=118&date=08-07-2024&duration=7&occupancy=r2c6_11_15&destination=16&sortorder=5&page=1&sr=true

Thanks for reading, any pointers would be greatly appreciated.

leemeng · November 10, 2023, 1:35am

This is quite common nowadays with modern websites and dynamic content, especially for travel websites which are loading up real-time data. So the old methods of using Python requests alone might not work. OTOH you can sometimes access the site's API server directly and just get the JSON data, which seems to be the case with this site. You can find out the proper API URL by sniffing around the Network tab with "Fetch/XHR" option (see screenshot).

In this example, the API URL is
https://www.jet2holidays.com/api/jet2/smartsearch/search?departureAirportIds=118&destinationAreaIds=16&departureDate=2024-07-08&durations=7&occupancies=2_6-11-15&pageNumber=1&pageSize=12&sortOrder=5&filters=&holidayTypeId=0&flexibility=7&minPrice=&includePriceBreakDown=false&brandId=&inboundFlightId=0&outboundFlightId=0&gtmSearchType=Smart%20Search&searchId=&applyDiscount=true&occupancyOpen=false&useMultiSearch=false&defaultSearchParametersUsed=false&inboundFlightTimes=&outboundFlightTimes=

This will return JSON data, which is easily parsed with Python (import json). This is not a Python forum though, and you can find a lot of other resources and videos elsewhere about this.

Jamez100 · November 10, 2023, 7:33pm

Hi leemeng,

Thank you for taking the time to reply.

This is really helpful, I suspected there may be another way to access the data on this site, but didn't know where to look.

Much appreciated