Unable to get all pages

Hello to all,

i'm trying to get back information from this site: https://getcomics.info/
But i'm not able to get all pages.

Can someone explains me what i'm doing wrong ? thanks

Sitemap:

{"_id":"getcomics","startUrl":["https://getcomics.info/"],"selectors":[{"id":"items","type":"SelectorElement","parentSelectors":["Content"],"selector":"article.post","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["items"],"selector":"h1.post-title","multiple":false,"regex":"","delay":0},{"id":"Content","type":"SelectorElement","parentSelectors":["_root","nextbutton"],"selector":"section.page-contents","multiple":false,"delay":0},{"id":"nextbutton","type":"SelectorElementClick","parentSelectors":["Content"],"selector":"nav.pagination.pagination-standard","multiple":false,"delay":0,"clickElementSelector":"a.pagination-button","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

Try this

{"_id":"getcomics","startUrl":["https://getcomics.info/","https://getcomics.info/page/[2-2078]/"],"selectors":[{"id":"title","type":"SelectorText","parentSelectors":["Content"],"selector":"h1.post-title","multiple":false,"regex":"","delay":0},{"id":"Content","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.post-info","multiple":true,"delay":0},{"id":"Year/size","type":"SelectorText","parentSelectors":["Content"],"selector":"p:nth-of-type(2)","multiple":false,"regex":"","delay":0}]}

Thanks bretfeig, it works for all pages except first one.
Is there a way to use pagination links ?

Thanks

You're saying it got you 2,078 pages and you're concerned about the first one? It actually scrapes the first page on my end.

1 Like