Scraping a Site with a Different Pagination on the Second Page

I'm attempting to scrape a site, but I noticed the pagination changes between the first and second page. That's because the Next and Previous buttons do not have different css selectors applied to them, and my sitemap always picks the first link it sees. So while I can get to the second page, my sitemap just sends me back to the previous page.

On the first page, the pagination HTML is:
< section class="section pt-0 pb-0">
< div class="container has-text-right">
< a class="button is-primary" href="/designs?page=2" >Next</ a>
</ div>
</ section>

On the second page, the pagination HTML is:
< section class="section pt-0 pb-0">
< div class="container has-text-right">
< a class="button is-primary" href="/designs?page=1">Previous< /a>
< a class="button is-primary" href="/designs?page=3">Next< /a>
< /div>
< /section>

Is it possible to select a link based on what's written between the < a > tags? That way I could ensure I always select the Next button.

Sitemap:
{"_id":"site","startUrl":["url"],"selectors":[{"id":"pagination","type":"SelectorLink","parentSelectors":["_root"],"selector":".container a.is-primary","multiple":false,"delay":0},{"id":"design","type":"SelectorElement","parentSelectors":["_root","pagination"],"selector":".column div.box","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["design"],"selector":".title a","multiple":false,"regex":"","delay":0},{"id":"code","type":"SelectorText","parentSelectors":["design"],"selector":"h3.is-5","multiple":false,"regex":"","delay":0},{"id":"uploader","type":"SelectorText","parentSelectors":["design"],"selector":"p a","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["design"],"selector":".has-text-right p","multiple":false,"regex":"","delay":0},{"id":"imageurl","type":"SelectorImage","parentSelectors":["design"],"selector":"img","multiple":false,"delay":0},{"id":"tags","type":"SelectorGroup","parentSelectors":["design"],"selector":"a.has-text-white","delay":0,"extractAttribute":""}]}

I figured it out. There's an amazing feature called :contains. So I updated my link selector from ".container a.is-primary" to ".container a.is-primary:contains("Next")" and it worked perfectly.

1 Like