Obfuscated links or just my incompetence?

I'm still somewhat new to this but have managed to get through several successful jobs with Web Scraper. This one is stumping me though: I can't access the links in these Internet Archive search results using any selector configuration whatsoever, so I can't even get started with a sitemap.

The search hit URLs I want to access are readily available by hovering or clicking through, but the selector only gives me "Parent does not contain the selected element". :woozy_face:

https://archive.org/details/texts?tab=collection&query=creator%3AChorev

Hi,

The issue occurs due the content being nested under various shadow-roots:

So this is an edge case where the point-and-click won't work and the selectors will have to be constructed manually by inspecting the HTML.

Here is a reference on how to access data in a shadow-root:

{"_id":"archive-org","startUrl":["https://archive.org/details/texts?tab=collection&query=creator%3AChorev"],"selectors":[{"id":"link","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"app-root:shadow-root collection-page:shadow-root collection-browser:shadow-root infinite-scroller:shadow-root tile-dispatcher:shadow-root a","type":"SelectorLink"},{"id":"title","multiple":false,"parentSelectors":["link"],"regex":"","selector":"span[itemprop='name']","type":"SelectorText"},{"id":"Identifier","multiple":false,"parentSelectors":["link"],"regex":"","selector":"span[itemprop='identifier']","type":"SelectorText"},{"id":"publisher","multiple":false,"parentSelectors":["link"],"regex":"","selector":"span[itemprop='publisher']","type":"SelectorText"}]}
1 Like

Wow, thank you so much. It works well and I've learned something new. It's some consolation that the solution wasn't right in front of my eyes.

1 Like