Hello. I'm a newbie here but learning fast. I am trying to scrape press releases like this one:
I need all the text from the article, but not the tables in it. When I try to scrape the article body with the selector "div.bw-release-story" or "[itemprop='articleBody']", the text scraped is incomplete. I guess it's just too long for an excel cell because of all the data in the tables. Well, is it possible that it's just too long? I tought of 2 ways to work around the issue:
-
Exclude the tables from my selection. I tried using :not(:has(div[class="bw-release-table-js"])) but I end up excluding the whole text if it contains tables. I may not be using it correctly?
-
Or is there a way to force Web Scraper to scrape all the text even if it's very long? Maybe split the text in multiple rows?
Thanks for your help.