A lot harder to crack this

https://iato.in/members/lists I just want to collect email and other info but could not scrap other than first 10. So hard to click those number buttons.

{"_id":"mailinglist","startUrl":["https://iato.in/members/lists"],"selectors":[{"id":"Agency","type":"SelectorLink","parentSelectors":["_root","pagination"],"selector":"td a","multiple":true,"delay":0},{"id":"Email","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(8)","multiple":false,"regex":"(?<=\s).","delay":0},{"id":"Contact Person","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(2)","multiple":false,"regex":"(?<=:\s).","delay":0},{"id":"Designation","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(3)","multiple":false,"regex":"(?<=\s).","delay":0},{"id":"Street Address","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(4)","multiple":false,"regex":"(?<=:\s).","delay":0},{"id":"City","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(5)","multiple":false,"regex":"(?<=\s).","delay":0},{"id":"Phone","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(9)","multiple":false,"regex":"(?<=\s).","delay":0},{"id":"Mobile","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(10)","multiple":false,"regex":"(?<=\s).","delay":0},{"id":"Website","type":"SelectorText","parentSelectors":["Agency"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(11)","multiple":false,"regex":"(?<=\s).","delay":0},{"id":"pagination","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"#menuTable_paginate","multiple":true,"delay":0,"clickElementSelector":"#menuTable_paginate > span > a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

Firstly, if you have a pagination without a link and you are using the Click Selector, then the structure of it is slightly different. Select the items that you would like to scrape further with the regular Selector and then choose the item you would like to click on with the Click Selector.

Secondly, when using the Click Selector, always incorporate a delay. When the page reloads, it takes a bit of time for all the elements to load and if your Element doesn't have a delay, it won't find the next element to click on within the first milliseconds and will presume that there are no more items to click on and will stop it its tracks.

This version of your sitemap should work:

{"_id":"mailinglist","startUrl":["https://iato.in/members/lists"],"selectors":[{"id":"agency-url","type":"SelectorLink","parentSelectors":["agency"],"selector":"a","multiple":false,"delay":0},{"id":"Email","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(8)","multiple":false,"regex":"","delay":0},{"id":"Contact Person","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"Designation","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(3)","multiple":false,"regex":"","delay":0},{"id":"Street Address","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(4)","multiple":false,"regex":"","delay":0},{"id":"City","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(5)","multiple":false,"regex":"","delay":0},{"id":"Phone","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(9)","multiple":false,"regex":"","delay":0},{"id":"Mobile","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(10)","multiple":false,"regex":"","delay":0},{"id":"Website","type":"SelectorText","parentSelectors":["agency-url"],"selector":"div.post-content:nth-of-type(4) p:nth-of-type(11)","multiple":false,"regex":"","delay":0},{"id":"agency","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"tbody tr[role="row"]","multiple":true,"delay":"1000","clickElementSelector":"a.paginate_button.next","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"}]}

Btw I deleted your regex to import the sitemap, so you might want to put it back in.

1 Like