How to scrape multiple FAQ's from one page without wrapper element present?

Hi all,

Did many scrapes before, but this one I can't figure out. I would love to have FAQ's data extracted, using this structure: category/faq_title/faq_answer, but the problem is that all required data is loaded on one single page, and I can't define any logical wrapper element. So of course I get messed up data...
Therefore this Sitemap is not gonna work as is. Or maybe I'm just missing something of course :wink:

Who knows what to do??


{"_id":"enexis","startUrl":[""],"selectors":[{"id":"categorie","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"a.wp-contact-channel","multiple":true,"delay":0,"clickElementSelector":".collapsed h3","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"title","type":"SelectorText","parentSelectors":["categorie"],"selector":"div.panel-heading","multiple":true,"regex":"","delay":0},{"id":"answer","type":"SelectorText","parentSelectors":["categorie"],"selector":"div.panel-body","multiple":true,"regex":"","delay":0}]}

My situation is the same. I want to extract elements positioned relative to a "sibling" element since there is no parent element present.

Were you able to figure this out? If so, what was the solution?

No sorry, never had any reply or solution.

I haven't specifically looked at your code but I believe I have found the solution to both our situations: CSS selectors

This short video goes through the use of the CSS selector "contains", for example.

Since a parent element is not present, you could use an elements sibling to select the multiple correspondences:
E + F selects an F element immediately preceded by an E element

This wikipedia article explains all css selectors available.

Hope this helps

O wow! That looks promising! Thanks very much!