Can not understand Pagination how to use? How to debug web scraper?

I want to scrap some image form website. The script can walk through next_page, but only get first image url ,not all the rest page.Please tell me how to do.

Sitemap:

{
    "_id":"new",
    "startUrl":[
        "https://manhua.fffdm.com/2/"
    ],
    "selectors":[
        {
            "id":"episode",
            "parentSelectors":[
                "wrapper"
            ],
            "type":"SelectorLink",
            "selector":"a",
            "multiple":false,
            "delay":0
        },
        {
            "id":"wrapper",
            "parentSelectors":[
                "_root"
            ],
            "type":"SelectorElement",
            "selector":"li.pure-u-1-2",
            "multiple":true,
            "delay":0
        },
        {
            "id":"next_page",
            "parentSelectors":[
                "episode",
                "next_page"
            ],
            "paginationType":"auto",
            "selector":"a.pure-button-primary:nth-last-of-type(1)",
            "type":"SelectorPagination"
        },
        {
            "id":"img",
            "parentSelectors":[
                "episode",
                "next_page"
            ],
            "type":"SelectorImage",
            "selector":"img#mhpic",
            "multiple":false,
            "delay":0
        }
    ]
}

@jiffies Hi, it appears that the targeted website is very unresponsive and it can take more than 5-10 seconds to fully load the next page.

After performing tests using Web Scraper Cloud I successfully managed to scrape the desired data by setting the page load delay to at least 5'000 and implementing an additional 'Element scroll' selector.


Sitemap example:

{"_id":"manhua-fffdm-com","startUrl":["https://manhua.fffdm.com/2/978/index_2.html"],"selectors":[{"id":"image-pagination","paginationType":"linkFromHref","parentSelectors":["_root","image-pagination"],"selector":"a.button-success + a:not(.pure-button-primary)","type":"SelectorPagination"},{"delay":0,"id":"image","multiple":false,"parentSelectors":["body"],"selector":"div#mhimg0 img","type":"SelectorImage"},{"delay":0,"id":"body","multiple":true,"parentSelectors":["image-pagination"],"selector":"body","type":"SelectorElement"}]}

1 Like