How do I select only text with no HTML, for the entire page?

I am trying to grab all visible text from the page in a selector per page. I haven't found a way to get text without HTML in it or I can only setup selectors for each text element, but then the format gets busted and it becomes kind of unwieldy. I really only want one field with all visible text on the page...
Hoping someone will know how to do this :slight_smile:

@rninja

Hi, can you provide your sitemap or the targeted website?

Hi @ViestursWS ! Let's take cnn.com as an example. I want to pull the body copy without HTML and have multiple instances saved to a CSV. I was using the HTML Type intially and then changed it to Text (while selecting "multiple" and ensuring that the selector attribute is set to the text tag type - P for example).

The question is, now that I do get text without HTML now, is there a way to collect all HTML text from multiple

, ,

, on the page as one cell's worth of data in the CSV or XLSX file? how would one collect those as a single selector?

You can do it with the selector html or body (check site's source to see which is used), but this would be a pretty messy way to get data.

Type: Text
Selector: html