Scraping part of text

AvisMathias · February 27, 2024, 1:14pm

Hi guys

I want to scrape a part of the text on these sites. I face two problems:

I need the scraper to click on multiple links on multiple sites, how to?
I need to scrape only the part of text under the undertitle "Indbrud", but it can't recognize it as an element of itself. How to?

Url: Døgnrapporter | Politi

Sitemap: N/A

JanAp · April 9, 2024, 12:35pm

Hi,

You can try this kind of setup:

{"_id":"politi","startUrl":["https://politi.dk/doegnrapporter?fromDate=2023/1/1&toDate=2024/2/27&newsType=Alle&page=[1-42]&district=OEstjyllands-Politi"],"selectors":[{"id":"listing","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.newsResultLink","type":"SelectorLink"},{"id":"Indbrud","multiple":false,"parentSelectors":["listing"],"regex":"(?<=Indbrud)[^]+","selector":".rich-text","type":"SelectorText"}]}