Sitemap is providing data insufficiently/missing data

Aastha · October 20, 2023, 2:18pm

Hi, I am trying to scrape data from the below website. But even after creating a proper sitemap I am getting insufficient data. It seems it is skipping some data or giving us interleaved data.

I need suggestions on why this is happening ?
Is there a way to resolve it?
Do we have a list of websites or website structures where https://webscraper.io/ doesn't work properly?

Url: https://www.hioscar.com/faq#for-medicare

In this website, I need only data from medicare section.

Sitemap:
{"_id":"hioscar","startUrl":["https://www.hioscar.com/faq#for-medicare"],"selectors":[{"id":"link","parentSelectors":["_root"],"type":"SelectorLink","selector":"a.h-219IrFXyCMpob8eCiWjqAa","multiple":true,"linkType":"linkFromHref"},{"id":"data","parentSelectors":["link"],"type":"SelectorText","selector":"div.h-10C2yOzK-Pu-3erMTexkfi","multiple":true,"regex":""}]}

Aastha · October 20, 2023, 2:41pm

Hi, I need only data from medicare section that's why I selected only those link selectors for that.
Also the sitemap that you provided me is not giving the required data.

Aastha · October 20, 2023, 2:56pm

Oh I have written before the sitemap url.

Aastha · October 20, 2023, 3:00pm

So how to create a sitemap for this use case?

Aastha · October 20, 2023, 3:09pm

This is not known to me that whether they keep changing their content. For now I need the existing links in medicare section and their data. I think it has about 90 links.

Aastha · October 20, 2023, 3:14pm

One more thing - is it possible to get the embedded links present in data ?i.e. when I export the data will the data (having embedded links) get replaced with actual links or a separate column will be provided for the links ?

Aastha · October 20, 2023, 3:43pm

This is my observation..we still got only 65 entries but the links were like 90. Do you know why this is happening ?Is it because of website structure?

Aastha · October 22, 2023, 2:38pm

oh yes in the second url sitemap I am getting 90 links. Thanks.

I have a doubt - how have you chosen click selector?

selector - body
Click selector - div#mainContent:not(:has[footer]) li div:contains("For Medicare"):first-of-type

Also will selector will always be a body ?

I always used to select on the element to get its classname automatically but it seems you are not doing that.

Aastha · October 23, 2023, 7:02am

I see.
Still can you explain this -
Click selector - div#mainContent:not(:has[footer]) li div:contains("For Medicare"):first-of-type