The goal is to retrieve news articles (title, link, body, image, date) that contain keyword лгбт (lgbt) from russian news website.
However my task ends returning me only 120 entries out of 2554 I see that exist on the site. How to scrape all 2554?
Or if the site is limiting my scraping, can I somehow set the extraction of data that belong to the period of 2022.12.04-2023.10.01?
Url: РИА Новости - события в Москве, России и мире: темы дня, фото, видео, инфографика, радио
Sitemap:
{"_id":"rialgbt","startUrl":["РИА Новости - события в Москве, России и мире: темы дня, фото, видео, инфографика, радио div.article__text","multiple":false,"regex":""},{"id":"Date","parentSelectors":["link"],"type":"SelectorText","selector":".m-active .article__info-date a","multiple":false,"regex":""},{"id":"Quotes","parentSelectors":["link"],"type":"SelectorText","selector":"div.article__quote-text","multiple":false,"regex":""},{"id":"paginator","parentSelectors":["_root","paginator"],"paginationType":"clickMore","type":"SelectorPagination","selector":"div.list-more"},{"id":"шьп","parentSelectors":["link"],"type":"SelectorImage","selector":".m-active .photoview__open img","multiple":false}]}
Error log:
yandex.ru/ads/system/header-bidding.js:1 Failed to load resource: net::ERR_BLOCKED_BY_CLIENT
yandex.ru/ads/system/context.js:1 Failed to load resource: net::ERR_BLOCKED_BY_CLIENT
fbcheck.min.js?9e1f41805:145 fb_cookie_get Array(2)
fbcheck.min.js?9e1f41805:138 fb_cookie_set ispwa-user-visits
fbcheck.min.js?9e1f41805:171 ### Service worker register
fbcheck.min.js?9e1f41805:83 fb_supported
fbcheck.min.js?9e1f41805:106 fb_permission_denied
fbcheck.min.js?9e1f41805:106 fb_permission_denied
script.js?9ede75a20:6 PAGE READY
ria_layout_manager.js?9d8b8b816:1 ria_layout_manager >>> inited
/services/banners/get/?type=search&adfox_value=ria_ru&adfox_pk=ria_ru:1 Failed to load resource: net::ERR_BLOCKED_BY_CLIENT
/services/banners/get/?type=search&adfox_value=ria_ru&adfox_pk=ria_ru:1 Failed to load resource: net::ERR_BLOCKED_BY_CLIENT
/services/banners/get/?type=search&adfox_value=ria_ru&adfox_pk=ria_ru:1 Failed to load resource: net::ERR_BLOCKED_BY_CLIENT
ria.get.banners.js?983a957e5:3 --- getBanners ERROR GetData --- /services/banners/get/?type=search&adfox_value=ria_ru&adfox_pk=ria_ru
/services/banners/get/?type=search&adfox_value=ria_ru&adfox_pk=ria_ru:1 Failed to load resource: net::ERR_BLOCKED_BY_CLIENT
fbcheck.min.js?9e1f41805:145 fb_cookie_get Array(2)
fbcheck.min.js?9e1f41805:95 fb_cookie_ispwa-app-install-banner_exist showed
ria.ru/:1 Uncaught (in promise) Error: A listener indicated an asynchronous response by returning true, but the message channel closed before a response was received