Its working - but not really

Till_Uberfarbe · July 30, 2021, 8:49am

Hey community,
im scraping a german newspaper archive. I already scraped 2020, but now when i return to scrape 2019 webscraper does its thing but there is no data in the resulting csv. Same happens when i run the 2020 scraper (that used to work) - no data and it tells me that the parent element could not be found. Can you help me troubleshoot? Im not sure how to locate the problem..

Thanks so much in advance

Url: Archiv – Politik Nachrichten – Januar 2019 – Sueddeutsche.de -

Sitemap:
{"_id":"sz012019","startUrl":["https://www.sueddeutsche.de/archiv/politik/2019/01/page/[1-100]"],"selectors":[{"id":"article","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.entrylist__entry","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["article"],"selector":"em.entrylist__title","multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorLink","parentSelectors":["article"],"selector":"a","multiple":false,"delay":0},{"id":"content","type":"SelectorText","parentSelectors":["element-card"],"selector":"article.lp_is_start","multiple":false,"regex":"","delay":0},{"id":"element-card","type":"SelectorElement","parentSelectors":["link"],"selector":"body:has(article#readspeaker-content)","multiple":true,"delay":0},{"id":"foto","type":"SelectorImage","parentSelectors":["element-card"],"selector":"[data-hydration-component-name="ImageAsset"] img","multiple":false,"delay":0},{"id":"date","type":"SelectorText","parentSelectors":["element-card"],"selector":"time","multiple":false,"regex":"","delay":0}]}

Asad · July 30, 2021, 8:00pm

Hope it works. Please change the text selector as you want to scrape.
sitemap:

{"_id":"sueddeutsche","startUrl":["https://www.sueddeutsche.de/archiv/politik/2019/01"],"selectors":[{"id":"pages","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.entrylist__entry","multiple":true,"delay":2000,"clickElementSelector":".arrow a","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueCSSSelector"},{"id":"click","type":"SelectorLink","parentSelectors":["pages"],"selector":"a.entrylist__link","multiple":false,"delay":0},{"id":"title i think","type":"SelectorText","parentSelectors":["click"],"selector":".css-1r9juou font font","multiple":false,"regex":"","delay":0}]}

Till_Uberfarbe · August 2, 2021, 7:30am

Hey Asad, thanks for the help! unfortunately, this does not work. The articles do not open and no data is scraped. I will try to adjust it and get back if it works