Advanced pagination issue - Kumon website

Hello folks,
I have a cool pagination issue with the website of the brand Kumon.

I have already tried the trick of making pagination a child of itself, I have also tried the trick of putting the page number between brackets... It doesn't work...

The issue is that going through the different pages by changing the page number doesn't have the page change unfortunately..... And I can't figure out how to iterate through pages...
This URL https://www.kumon.co.uk/find-a-tutor/?centre_search=london&page=59 won't have you land in the expected page, so this is all about finding out how to iterate thourgh pages in this website lol...

@iconoclast, @bretfeig and the others does it appeal you lol ?

Url: https://www.kumon.co.uk/find-a-tutor

Sitemap:
{"selectors":[{"parentSelectors":["_root","pagination"],"type":"SelectorLink","multiple":true,"id":"link","selector":"div.col.col-md-4 a","delay":""},{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"pagination","selector":"ul.pagination li:nth-of-type(n+2) a","delay":""},{"parentSelectors":["link"],"type":"SelectorText","multiple":false,"id":"name","selector":"h1.text-center","regex":"","delay":""},{"parentSelectors":["link"],"type":"SelectorText","multiple":false,"id":"address","selector":"div.banner-text div.text-center > span:nth-of-type(1)","regex":"","delay":""},{"parentSelectors":["link"],"type":"SelectorText","multiple":false,"id":"zip_code","selector":"span span:nth-of-type(4)","regex":"(GIR|[A-Z]\d[A-Z\d]??|[A-Z]{2}\d[A-Z\d]??)[ ]??(\d[A-Z]{2})","delay":""},{"parentSelectors":["link"],"type":"SelectorText","multiple":false,"id":"city","selector":"span span:nth-of-type(3)","regex":"","delay":""}],"startUrl":"https://www.kumon.co.uk/find-a-tutor/?centre_search=london&page=[1-62]","_id":"kumon_test"}

Thank's in advance,
Nicolas.

Hi Nicolas!

Well, you did not try using Element Click for clicking through pages.

Here's a working example i've made for you:
{"_id":"kumon","startUrl":["https://www.kumon.co.uk/find-a-tutor/?centre_search=london"],"selectors":[{"id":"Clicky_click","type":"SelectorElementClick","selector":"div.panel","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"ul.pagination li:last-child a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Centre name","type":"SelectorText","selector":"div.centre-name","parentSelectors":["Clicky_click"],"multiple":false,"regex":"","delay":0},{"id":"Instructor","type":"SelectorText","selector":"div.instructor-name","parentSelectors":["Clicky_click"],"multiple":false,"regex":"","delay":0},{"id":"Possibilities","type":"SelectorText","selector":"div.centre-filters","parentSelectors":["Clicky_click"],"multiple":false,"regex":"","delay":0}]}

Since Next button is not separately placed within pagination, and cannot be called directly (like 'button next'), a CSS selector :last-child is used in this case to call the very last element in pagination.

It seems i've scraped whole England haha

Please refer to post created by @KristapsWS when you're choosing right selector for pagination:

Hi @iconoclast, so speedy :slight_smile:
I don't use the element click selector coz I'm interested in clicking on every store to reach the content like the address etc.
Are you sure I can do this with the element click selector ?

Absolutely!

All you left to do is just add Link selector inside (make child of) Element Click. Then, inside Link selector, you can add text selectors to pick information out of particular centre page.

Please note that it will first go through all the pages, then it will scrape each centre. I would recommend you to narrow the results to save time.

Another example:
{"_id":"kumon2","startUrl":["https://www.kumon.co.uk/find-a-tutor/?centre_search=london"],"selectors":[{"id":"Clicky_click","type":"SelectorElementClick","selector":"div.panel","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"ul.pagination li:last-child a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Centre name","type":"SelectorText","selector":"div.centre-name","parentSelectors":["Clicky_click"],"multiple":false,"regex":"","delay":0},{"id":"Instructor","type":"SelectorText","selector":"div.instructor-name","parentSelectors":["Clicky_click"],"multiple":false,"regex":"","delay":0},{"id":"Possibilities","type":"SelectorText","selector":"div.centre-filters","parentSelectors":["Clicky_click"],"multiple":false,"regex":"","delay":0},{"id":"Link","type":"SelectorLink","selector":"div.col.col-md-4 a","parentSelectors":["Clicky_click"],"multiple":true,"delay":"1000"},{"id":"About","type":"SelectorText","selector":"div.col-sm-8 div.col-xs-12 p:nth-of-type(1)","parentSelectors":["Link"],"multiple":false,"regex":"","delay":0}]}

Wow yeah great work !

I don't know what happens on your side but on mine the behaviour is really weird. A lot of data is missing, a lot of centres are not scraped like this one for instance https://www.kumon.co.uk/ancoats/
Definitely weird.

Have you noticed that if you open website after awhile, it won't open search for 'London', but draw an empty search field instead? It seems that search is being locally kept within a cookie.

You have to do search first, then narrow the results using filter for time saving, and then scrape.

I'd also increase delay on Link selector to 2000ms.

Yep sure you need to have a cookie.
Btw even when increasing the delay it doesn't succeed in scraping everything, the behaviour is random... sometimes I even get only 4 or 5 stores scraped lol

You also have to click on page 1 in order for scraper to go through all of them. Because due to a cookie it will also keep selected page saved too.

Even by beginning at the first page, it doesn't succeed in going till the last page.
The best scraping I have done so far provides me 242 stores, I expect around 600.

Hmm.

I've redone the sitemap, now Element Click has a limit, that will be reached once you're at the ending page (' > ' button becomes disabled if you're on the last page).
Done it using ':last-child:not(.disabled) a' CSS selector to limit button pressing to only it's active state.

I've ran this sitemap (searched for London, then narrowed results using 3 last filters in filter section, then scraped):
{"_id":"kumon2_links_only","startUrl":["https://www.kumon.co.uk/find-a-tutor/"],"selectors":[{"id":"Clicky_click","type":"SelectorElementClick","selector":"div.panel","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"ul.pagination li:last-child:not(.disabled) a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Link","type":"SelectorLink","selector":"div.col.col-md-4 a","parentSelectors":["Clicky_click"],"multiple":true,"delay":"1000"}]}

It shows 75 results exactly as on a website. You can add text selectors inside Link selector.
Don't forget to click page number 1 beforehand.

Still doesn't work, we only get around 240 stores out of 600 unfortunately :confused:

@iconoclast if you don't filter on anything, does this sitemap work well on your side ?

I'll look into it once I'll be home today.

I've found a workaround. There's always one.

You can scrape all links to stores using my last mentioned sitemap (but without any selectors after Link selector to pick only URLs), and use them to create a new sitemap, that will pick only centre details separately.

Well, since i have a macro to create multiple URL sitemap, i've saved you some time (total of 614 centre): https://pastebin.com/WZeQS6kh

Just import it and set selectors.

Works like a charm with this workaround :wink:
Hard to automate this process unfortunately.